zoukankan      html  css  js  c++  java
  • pandas的级联操作

    级联操作

    • pd.concat, pd.append
    import pandas as pd
    from pandas import DataFrame
    import numpy as np
    

    pandas使用pd.concat函数,与np.concatenate函数类似,只是多了一些参数:

    objs
    axis=0
    keys
    join='outer' / 'inner':表示的是级联的方式,outer会将所有的项进行级联(忽略匹配和不匹配),而inner只会将匹配的项级联到一起,不匹配的不级联
    ignore_index=False
    
    • 匹配级联
    df1 = DataFrame({'employee':['Bobs','Linda','Bill'],
                    'group':['Accounting','Product','Marketing'],
                   'hire_date':[1998,2017,2018]})
    df1
    
    employee group hire_date
    0 Bobs Accounting 1998
    1 Linda Product 2017
    2 Bill Marketing 2018
    pd.concat((df1,df1),axis=0)
    
    employee group hire_date
    0 Bobs Accounting 1998
    1 Linda Product 2017
    2 Bill Marketing 2018
    0 Bobs Accounting 1998
    1 Linda Product 2017
    2 Bill Marketing 2018
    • 不匹配级联
      • 不匹配指的是级联的维度的索引不一致。例如纵向级联时列索引不一致,横向级联时行索引不一致
      • 有2种连接方式:
        • 外连接:补NaN(默认模式)
        • 内连接:只连接匹配的项
    df2 = df1.copy()
    df2.columns = ['employee','groupps','hire_date']
    df2
    
    employee groupps hire_date
    0 Bobs Accounting 1998
    1 Linda Product 2017
    2 Bill Marketing 2018
    pd.concat((df1,df2),axis=0)
    
    # 按列级联,发现不同列就加在表中,用NaN补全空的字段
    
    employee group groupps hire_date
    0 Bobs Accounting NaN 1998
    1 Linda Product NaN 2017
    2 Bill Marketing NaN 2018
    0 Bobs NaN Accounting 1998
    1 Linda NaN Product 2017
    2 Bill NaN Marketing 2018
    • join:
      • inner:只对可以匹配的项进行级联
      • outer:可以级联所有的项
    pd.concat((df1,df2),axis=0,join='inner')
    
    employee hire_date
    0 Bobs 1998
    1 Linda 2017
    2 Bill 2018
    0 Bobs 1998
    1 Linda 2017
    2 Bill 2018
    • append函数的使用: append只可以进行纵向的级联

      employee group groupps hire_date
      0 Bobs Accounting NaN 1998
      1 Linda Product NaN 2017
      2 Bill Marketing NaN 2018
      0 Bobs NaN Accounting 1998
      1 Linda NaN Product 2017
      2 Bill NaN Marketing 2018

    合并操作

    • merge与concat的区别在于,merge需要依据某一共同列来进行合并

    • 使用pd.merge()合并时,会自动根据两者相同column名称的那一列,作为key来进行合并。

    • 注意每一列元素的顺序不要求一致

    一对一合并

    df1 = DataFrame({'employee':['Bob','Jake','Lisa'],
                    'group':['Accounting','Engineering','Engineering'],
                    })
    df1
    
    employee group
    0 Bob Accounting
    1 Jake Engineering
    2 Lisa Engineering
    df2 = DataFrame({'employee':['Lisa','Bob','Jake'],
                    'hire_date':[2004,2008,2012],
                    })
    df2
    
    employee hire_date
    0 Lisa 2004
    1 Bob 2008
    2 Jake 2012
    pd.merge(df1,df2,on='employee')
    
    employee group hire_date
    0 Bob Accounting 2008
    1 Jake Engineering 2012
    2 Lisa Engineering 2004

    一对多合并

    df3 = DataFrame({
        'employee':['Lisa','Jake'],
        'group':['Accounting','Engineering'],
        'hire_date':[2004,2016]})
    df
    
    employee group hire_date
    0 Lisa Accounting 2004
    1 Jake Engineering 2016
    df4 = DataFrame({'group':['Accounting','Engineering','Engineering'],
                           'supervisor':['Carly','Guido','Steve']
                    })
    df4
    
    group supervisor
    0 Accounting Carly
    1 Engineering Guido
    2 Engineering Steve
    pd.merge(df3,df4)
    
    # 会默认以两个表的共同字段 group 进行关联,将两张表中的全部数据进行合并
    
    employee group hire_date supervisor
    0 Lisa Accounting 2004 Carly
    1 Jake Engineering 2016 Guido
    2 Jake Engineering 2016 Steve

    多对多合并

    df1 = DataFrame({'employee':['Bob','Jake','Lisa'],
                     'group':['Accounting','Engineering','Engineering']})
    df1
    
    employee group
    0 Bob Accounting
    1 Jake Engineering
    2 Lisa Engineering
    df5 = DataFrame({'group':['Engineering','Engineering','HR'],
                    'supervisor':['Carly','Guido','Steve']
                    })
    df5
    
    group supervisor
    0 Engineering Carly
    1 Engineering Guido
    2 HR Steve
    pd.merge(df1,df5,how='outer')
    
    employee group supervisor
    0 Bob Accounting NaN
    1 Jake Engineering Carly
    2 Jake Engineering Guido
    3 Lisa Engineering Carly
    4 Lisa Engineering Guido
    5 NaN HR Steve

    key的规范化

    • 当列冲突时,即有多个列名称相同时,需要使用on=来指定哪一个列作为key,配合suffixes指定冲突列名
    df1 = DataFrame({'employee':['Jack',"Summer","Steve"],
                     'group':['Accounting','Finance','Marketing']})
    df1
    
    employee group
    0 Jack Accounting
    1 Summer Finance
    2 Steve Marketing
    df2 = DataFrame({'employee':['Jack','Bob',"Jake"],
                     'hire_date':[2003,2009,2012],
                    'group':['Accounting','sell','ceo']})
    df2
    
    employee group hire_date
    0 Jack Accounting 2003
    1 Bob sell 2009
    2 Jake ceo 2012
    pd.merge(df1,df2,on='group')
    
    # 指定按照 group 这一列来进行分组,会将两个表中相同的字段名都加在合并后的表中,两个字段会改变
    # 也可以通过这个 suffixes=('_x', '_y') 参数进行一个修改
    
    employee_x group employee_y hire_date
    0 Jack Accounting Jack 2003
    • 当两张表没有可进行连接的列时,可使用left_on和right_on手动指定merge中左右两边的哪一列作为连接的列
    df1 = DataFrame({'employee':['Bobs','Linda','Bill'],
                    'group':['Accounting','Product','Marketing'],
                   'hire_date':[1998,2017,2018]})
    df1
    
    employee group hire_date
    0 Bobs Accounting 1998
    1 Linda Product 2017
    2 Bill Marketing 2018
    df5 = DataFrame({'name':['Lisa','Bobs','Bill'],
                    'hire_dates':[1998,2016,2007]})
    df5
    
    hire_dates name
    0 1998 Lisa
    1 2016 Bobs
    2 2007 Bill
    pd.merge(df1,df5,left_on='employee',right_on='name',how='outer')
    
    # 分别指定df1表中的employee字段和df5表中的name字段进行一个比对连接,
    
    employee group hire_date hire_dates name
    0 Bobs Accounting 1998.0 2016.0 Bobs
    1 Linda Product 2017.0 NaN NaN
    2 Bill Marketing 2018.0 2007.0 Bill
    3 NaN NaN NaN 1998.0 Lisa

    内合并与外合并:out取并集 inner取交集

  • 相关阅读:
    AngularJS Insert Update Delete Using PHP MySQL
    Simple task manager application using AngularJS PHP MySQL
    AngularJS MySQL and Bootstrap Shopping List Tutorial
    Starting out with Node.js and AngularJS
    AngularJS CRUD Example with PHP, MySQL and Material Design
    How to install KVM on Fedora 22
    Fake_AP模式下的Easy-Creds浅析
    河南公务员写古文辞职信
    AI
    政协委员:最大愿望是让小学生步行上学
  • 原文地址:https://www.cnblogs.com/zhufanyu/p/12031641.html
Copyright © 2011-2022 走看看