zoukankan      html  css  js  c++  java
  • python之pandas&&DataFrame(二)

    简单操作

    Python-层次聚类-Hierarchical clustering

    >>> data = pd.Series(np.random.randn(10),index=[['a','a','a','b','b','c','c','d','d','d'],[1,2,3,1,2,1,2,3,1,2]])
    >>> data
    a  1   -0.168871
       2    0.828841
       3    0.786215
    b  1    0.506081
       2   -2.304898
    c  1    0.864875
       2    0.183091
    d  3   -0.678791
       1   -1.241735
       2    0.778855
    dtype: float64

    Hierarchical与DataFrame之间的转换

    >>> data.unstack()
              1         2         3
    a -0.168871  0.828841  0.786215
    b  0.506081 -2.304898       NaN
    c  0.864875  0.183091       NaN
    d -1.241735  0.778855 -0.678791
    >>> type(data.unstack())
    <class 'pandas.core.frame.DataFrame'>

    Merge,join,Concatenate

    >>> df2 = pd.DataFrame({'apts':[55000,60000],'cars':[15000,12000]},index=['hangzhou','najing'])
    >>> df1 = pd.DataFrame({'apts':[55000,60000],'cars':[20000,30000]},index=['shanghai','beijing'])
    >>> df3 = pd.DataFrame({'apts':[55000,60000],'cars':[15000,12000]},index=['guangzhou','chongqing'])
    >>> [df1,df2,df3]
    [           apts   cars
    shanghai  55000  20000
    beijing   60000  30000,            apts   cars
    hangzhou  55000  15000
    najing    60000  12000,             apts   cars
    guangzhou  55000  15000
    chongqing  60000  12000]
    >>> pd.concat([df1,df2,df3])
                apts   cars
    shanghai   55000  20000
    beijing    60000  30000
    hangzhou   55000  15000
    najing     60000  12000
    guangzhou  55000  15000
    chongqing  60000  12000
    frames = [df1,df2,df3]
    >>> result2 = pd.concat(frames,keys=['x','y','z'])
    >>> result2
                  apts   cars
    x shanghai   55000  20000
      beijing    60000  30000
    y hangzhou   55000  15000
      najing     60000  12000
    z guangzhou  55000  15000
      chongqing  60000  12000

    进行拼接concat

    >>> df4 = pd.DataFrame({"salaries":[10000,30000,30000,20000,15000]},index=['suzhou','beijing','shanghai','guanghzou','tianjin'])
    >>> result3 = pd.concat([result,df4],axis=1)
    >>> result3
                  apts     cars  salaries
    beijing    60000.0  30000.0   30000.0
    chongqing  60000.0  12000.0       NaN
    guanghzou      NaN      NaN   20000.0
    guangzhou  55000.0  15000.0       NaN
    hangzhou   55000.0  15000.0       NaN
    najing     60000.0  12000.0       NaN
    shanghai   55000.0  20000.0   30000.0
    suzhou         NaN      NaN   10000.0
    tianjin        NaN      NaN   15000.0

    合并两个DataFrame,并且只是交集

    >>> result3 = pd.concat([result,df4],axis=1,join='inner')
    >>> result3
               apts   cars  salaries
    shanghai  55000  20000     30000
    beijing   60000  30000     30000

    Series和DataFrame一起Concatenate

    >>> s1 = pd.Series([60,50],index=['shanghai','beijing'],name='meal')
    >>> s1
    shanghai    60
    beijing     50
    Name: meal, dtype: int64
    >>> type(s1)
    <class 'pandas.core.series.Series'>
    >>> df1
               apts   cars
    shanghai  55000  20000
    beijing   60000  30000
    >>> type(df1)
    <class 'pandas.core.frame.DataFrame'>
    >>> pd.concat([df1,s1],axis=1)
               apts   cars  meal
    shanghai  55000  20000    60
    beijing   60000  30000    50
    >>> 

    Series可以使用append进行行添加也可以列添加,但是concat不可以

    >>> s2 = pd.Series([18000,12000],index=['apts','cars'],name='xiamen')
    >>> s2
    apts    18000
    cars    12000
    Name: xiamen, dtype: int64
    >>> df1.append(s2)
               apts   cars
    shanghai  55000  20000
    beijing   60000  30000
    xiamen    18000  12000
    >>> pd.concat([df1,s2],axis=0)
                    0     apts     cars
    shanghai      NaN  55000.0  20000.0
    beijing       NaN  60000.0  30000.0
    apts      18000.0      NaN      NaN
    cars      12000.0      NaN      NaN
    >>> pd.concat([df1,s2],axis=1)
                 apts     cars   xiamen
    apts          NaN      NaN  18000.0
    beijing   60000.0  30000.0      NaN
    cars          NaN      NaN  12000.0
    shanghai  55000.0  20000.0      NaN
    >>> 

    merge合并

    >>> df1 = pd.DataFrame({"salaries":[10000,30000,30000,20000,15000],'cities':['suzhou','beijing','shanghai','guanghzou','tianjin']})
    >>> df4 = pd.DataFrame({'apts':[55000,60000],'cars':[15000,12000],'cities':['shanghai','beijing']})
    >>> result = pd.merge(df1,df4,on='cities') #on表示合并的列                                      
    >>> result cities salaries apts cars 0 beijing 30000 60000 12000 1 shanghai 30000 55000 15000
    >>> result = pd.merge(df1,df4,on='cities',how='right')
    >>> result
         cities  salaries   apts   cars
    0   beijing     30000  60000  12000
    1  shanghai     30000  55000  15000
    >>> result = pd.merge(df1,df4,on='cities',how='left')
    >>> result
          cities  salaries     apts     cars
    0     suzhou     10000      NaN      NaN
    1    beijing     30000  60000.0  12000.0
    2   shanghai     30000  55000.0  15000.0
    3  guanghzou     20000      NaN      NaN
    4    tianjin     15000      NaN      NaN
  • 相关阅读:
    SVN常用功能介绍(二)
    Excel文件导入SQL Server数据库
    ArcMap操作随记(2)
    ArcGIS温泉数据聚类分析、核密度分析
    ArcMap操作随记(1)
    ArcGIS下载安装
    新生报到问题(简单的数据采集)
    ArcGIS热点分析
    学校选址问题(学校用地适宜性分析)
    ArcScene数据与Sketchup数据的交互
  • 原文地址:https://www.cnblogs.com/chenyang920/p/8007527.html
Copyright © 2011-2022 走看看