zoukankan      html  css  js  c++  java
  • python之pandas&&DataFrame(二)

    简单操作

    Python-层次聚类-Hierarchical clustering

    >>> data = pd.Series(np.random.randn(10),index=[['a','a','a','b','b','c','c','d','d','d'],[1,2,3,1,2,1,2,3,1,2]])
    >>> data
    a  1   -0.168871
       2    0.828841
       3    0.786215
    b  1    0.506081
       2   -2.304898
    c  1    0.864875
       2    0.183091
    d  3   -0.678791
       1   -1.241735
       2    0.778855
    dtype: float64

    Hierarchical与DataFrame之间的转换

    >>> data.unstack()
              1         2         3
    a -0.168871  0.828841  0.786215
    b  0.506081 -2.304898       NaN
    c  0.864875  0.183091       NaN
    d -1.241735  0.778855 -0.678791
    >>> type(data.unstack())
    <class 'pandas.core.frame.DataFrame'>

    Merge,join,Concatenate

    >>> df2 = pd.DataFrame({'apts':[55000,60000],'cars':[15000,12000]},index=['hangzhou','najing'])
    >>> df1 = pd.DataFrame({'apts':[55000,60000],'cars':[20000,30000]},index=['shanghai','beijing'])
    >>> df3 = pd.DataFrame({'apts':[55000,60000],'cars':[15000,12000]},index=['guangzhou','chongqing'])
    >>> [df1,df2,df3]
    [           apts   cars
    shanghai  55000  20000
    beijing   60000  30000,            apts   cars
    hangzhou  55000  15000
    najing    60000  12000,             apts   cars
    guangzhou  55000  15000
    chongqing  60000  12000]
    >>> pd.concat([df1,df2,df3])
                apts   cars
    shanghai   55000  20000
    beijing    60000  30000
    hangzhou   55000  15000
    najing     60000  12000
    guangzhou  55000  15000
    chongqing  60000  12000
    frames = [df1,df2,df3]
    >>> result2 = pd.concat(frames,keys=['x','y','z'])
    >>> result2
                  apts   cars
    x shanghai   55000  20000
      beijing    60000  30000
    y hangzhou   55000  15000
      najing     60000  12000
    z guangzhou  55000  15000
      chongqing  60000  12000

    进行拼接concat

    >>> df4 = pd.DataFrame({"salaries":[10000,30000,30000,20000,15000]},index=['suzhou','beijing','shanghai','guanghzou','tianjin'])
    >>> result3 = pd.concat([result,df4],axis=1)
    >>> result3
                  apts     cars  salaries
    beijing    60000.0  30000.0   30000.0
    chongqing  60000.0  12000.0       NaN
    guanghzou      NaN      NaN   20000.0
    guangzhou  55000.0  15000.0       NaN
    hangzhou   55000.0  15000.0       NaN
    najing     60000.0  12000.0       NaN
    shanghai   55000.0  20000.0   30000.0
    suzhou         NaN      NaN   10000.0
    tianjin        NaN      NaN   15000.0

    合并两个DataFrame,并且只是交集

    >>> result3 = pd.concat([result,df4],axis=1,join='inner')
    >>> result3
               apts   cars  salaries
    shanghai  55000  20000     30000
    beijing   60000  30000     30000

    Series和DataFrame一起Concatenate

    >>> s1 = pd.Series([60,50],index=['shanghai','beijing'],name='meal')
    >>> s1
    shanghai    60
    beijing     50
    Name: meal, dtype: int64
    >>> type(s1)
    <class 'pandas.core.series.Series'>
    >>> df1
               apts   cars
    shanghai  55000  20000
    beijing   60000  30000
    >>> type(df1)
    <class 'pandas.core.frame.DataFrame'>
    >>> pd.concat([df1,s1],axis=1)
               apts   cars  meal
    shanghai  55000  20000    60
    beijing   60000  30000    50
    >>> 

    Series可以使用append进行行添加也可以列添加,但是concat不可以

    >>> s2 = pd.Series([18000,12000],index=['apts','cars'],name='xiamen')
    >>> s2
    apts    18000
    cars    12000
    Name: xiamen, dtype: int64
    >>> df1.append(s2)
               apts   cars
    shanghai  55000  20000
    beijing   60000  30000
    xiamen    18000  12000
    >>> pd.concat([df1,s2],axis=0)
                    0     apts     cars
    shanghai      NaN  55000.0  20000.0
    beijing       NaN  60000.0  30000.0
    apts      18000.0      NaN      NaN
    cars      12000.0      NaN      NaN
    >>> pd.concat([df1,s2],axis=1)
                 apts     cars   xiamen
    apts          NaN      NaN  18000.0
    beijing   60000.0  30000.0      NaN
    cars          NaN      NaN  12000.0
    shanghai  55000.0  20000.0      NaN
    >>> 

    merge合并

    >>> df1 = pd.DataFrame({"salaries":[10000,30000,30000,20000,15000],'cities':['suzhou','beijing','shanghai','guanghzou','tianjin']})
    >>> df4 = pd.DataFrame({'apts':[55000,60000],'cars':[15000,12000],'cities':['shanghai','beijing']})
    >>> result = pd.merge(df1,df4,on='cities') #on表示合并的列                                      
    >>> result cities salaries apts cars 0 beijing 30000 60000 12000 1 shanghai 30000 55000 15000
    >>> result = pd.merge(df1,df4,on='cities',how='right')
    >>> result
         cities  salaries   apts   cars
    0   beijing     30000  60000  12000
    1  shanghai     30000  55000  15000
    >>> result = pd.merge(df1,df4,on='cities',how='left')
    >>> result
          cities  salaries     apts     cars
    0     suzhou     10000      NaN      NaN
    1    beijing     30000  60000.0  12000.0
    2   shanghai     30000  55000.0  15000.0
    3  guanghzou     20000      NaN      NaN
    4    tianjin     15000      NaN      NaN
  • 相关阅读:
    走进DOM:HTML DOM
    iOS 去掉UITableView风格为group时候的最顶部的空白距离
    Codeforces 394D Physical Education and Buns 胡搞
    查询出每一个雇员的姓名,工资,部门名称,工资在公司的等级及其领导的姓名,领导的工资,以及领导所相应的等级
    CCBAnimationManager
    sendto 和 recvfrom 函数
    三张图让你高速明确activity与fragment生命周期的异同点
    EWS 流通知订阅邮件
    [EWS]如何: 通过使用 Exchange 中的 EWS 流有关邮箱事件的通知
    async、await正确姿势
  • 原文地址:https://www.cnblogs.com/chenyang920/p/8007527.html
Copyright © 2011-2022 走看看