zoukankan      html  css  js  c++  java
  • Pandas文摘:Join And Merge Pandas Dataframe

    原文地址:https://chrisalbon.com/python/data_wrangling/pandas_join_merge_dataframe/

    Join And Merge Pandas Dataframe

    20 Dec 2017

    import modules

    import pandas as pd
    from IPython.display import display
    from IPython.display import Image

    Create a dataframe

    raw_data = {
            'subject_id': ['1', '2', '3', '4', '5'],
            'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 
            'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}
    df_a = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name', 'last_name'])
    df_a
     subject_idfirst_namelast_name
    0 1 Alex Anderson
    1 2 Amy Ackerman
    2 3 Allen Ali
    3 4 Alice Aoni
    4 5 Ayoung Atiches

    Create a second dataframe

    raw_data = {
            'subject_id': ['4', '5', '6', '7', '8'],
            'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 
            'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}
    df_b = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name', 'last_name'])
    df_b
     subject_idfirst_namelast_name
    0 4 Billy Bonder
    1 5 Brian Black
    2 6 Bran Balwner
    3 7 Bryce Brice
    4 8 Betty Btisan

    Create a third dataframe

    raw_data = {
            'subject_id': ['1', '2', '3', '4', '5', '7', '8', '9', '10', '11'],
            'test_id': [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]}
    df_n = pd.DataFrame(raw_data, columns = ['subject_id','test_id'])
    df_n
     subject_idtest_id
    0 1 51
    1 2 15
    2 3 15
    3 4 61
    4 5 16
    5 7 14
    6 8 15
    7 9 1
    8 10 61
    9 11 16

    Join the two dataframes along rows

    df_new = pd.concat([df_a, df_b])
    df_new
     subject_idfirst_namelast_name
    0 1 Alex Anderson
    1 2 Amy Ackerman
    2 3 Allen Ali
    3 4 Alice Aoni
    4 5 Ayoung Atiches
    0 4 Billy Bonder
    1 5 Brian Black
    2 6 Bran Balwner
    3 7 Bryce Brice
    4 8 Betty Btisan

    Join the two dataframes along columns

    pd.concat([df_a, df_b], axis=1)
     subject_idfirst_namelast_namesubject_idfirst_namelast_name
    0 1 Alex Anderson 4 Billy Bonder
    1 2 Amy Ackerman 5 Brian Black
    2 3 Allen Ali 6 Bran Balwner
    3 4 Alice Aoni 7 Bryce Brice
    4 5 Ayoung Atiches 8 Betty Btisan

    Merge two dataframes along the subject_id value

    pd.merge(df_new, df_n, on='subject_id')
     subject_idfirst_namelast_nametest_id
    0 1 Alex Anderson 51
    1 2 Amy Ackerman 15
    2 3 Allen Ali 15
    3 4 Alice Aoni 61
    4 4 Billy Bonder 61
    5 5 Ayoung Atiches 16
    6 5 Brian Black 16
    7 7 Bryce Brice 14
    8 8 Betty Btisan 15

    Merge two dataframes with both the left and right dataframes using the subject_id key

    pd.merge(df_new, df_n, left_on='subject_id', right_on='subject_id')
     subject_idfirst_namelast_nametest_id
    0 1 Alex Anderson 51
    1 2 Amy Ackerman 15
    2 3 Allen Ali 15
    3 4 Alice Aoni 61
    4 4 Billy Bonder 61
    5 5 Ayoung Atiches 16
    6 5 Brian Black 16
    7 7 Bryce Brice 14
    8 8 Betty Btisan 15

    Merge with outer join

    “Full outer join produces the set of all records in Table A and Table B, with matching records from both sides where available. If there is no match, the missing side will contain null.” - source

    pd.merge(df_a, df_b, on='subject_id', how='outer')
     subject_idfirst_name_xlast_name_xfirst_name_ylast_name_y
    0 1 Alex Anderson NaN NaN
    1 2 Amy Ackerman NaN NaN
    2 3 Allen Ali NaN NaN
    3 4 Alice Aoni Billy Bonder
    4 5 Ayoung Atiches Brian Black
    5 6 NaN NaN Bran Balwner
    6 7 NaN NaN Bryce Brice
    7 8 NaN NaN Betty Btisan

    Merge with inner join

    “Inner join produces only the set of records that match in both Table A and Table B.” - source

    pd.merge(df_a, df_b, on='subject_id', how='inner')
     subject_idfirst_name_xlast_name_xfirst_name_ylast_name_y
    0 4 Alice Aoni Billy Bonder
    1 5 Ayoung Atiches Brian Black

    Merge with right join

    pd.merge(df_a, df_b, on='subject_id', how='right')
     subject_idfirst_name_xlast_name_xfirst_name_ylast_name_y
    0 4 Alice Aoni Billy Bonder
    1 5 Ayoung Atiches Brian Black
    2 6 NaN NaN Bran Balwner
    3 7 NaN NaN Bryce Brice
    4 8 NaN NaN Betty Btisan

    Merge with left join

    “Left outer join produces a complete set of records from Table A, with the matching records (where available) in Table B. If there is no match, the right side will contain null.” - source

    pd.merge(df_a, df_b, on='subject_id', how='left')
     subject_idfirst_name_xlast_name_xfirst_name_ylast_name_y
    0 1 Alex Anderson NaN NaN
    1 2 Amy Ackerman NaN NaN
    2 3 Allen Ali NaN NaN
    3 4 Alice Aoni Billy Bonder
    4 5 Ayoung Atiches Brian Black

    Merge while adding a suffix to duplicate column names

    pd.merge(df_a, df_b, on='subject_id', how='left', suffixes=('_left', '_right'))
     subject_idfirst_name_leftlast_name_leftfirst_name_rightlast_name_right
    0 1 Alex Anderson NaN NaN
    1 2 Amy Ackerman NaN NaN
    2 3 Allen Ali NaN NaN
    3 4 Alice Aoni Billy Bonder
    4 5 Ayoung Atiches Brian Black

    Merge based on indexes

    pd.merge(df_a, df_b, right_index=True, left_index=True)
     subject_id_xfirst_name_xlast_name_xsubject_id_yfirst_name_ylast_name_y
    0 1 Alex Anderson 4 Billy Bonder
    1 2 Amy Ackerman 5 Brian Black
    2 3 Allen Ali 6 Bran Balwner
    3 4 Alice Aoni 7 Bryce Brice
    4 5 Ayoung Atiches 8 Betty Btisan
  • 相关阅读:
    死锁及预防
    Java中的接口和抽象类
    Jmeter执行java脚本结束时提示:The JVM should have exited but did not.
    dubbo服务的group和version
    Dubbo-admin无法显示Group分组信息
    Python中的变量、引用、拷贝和作用域
    记一次调试python内存泄露的问题
    使用gdb调试python程序
    dstat用法;利用awk求dstat所有列每列的和;linux系统监控
    flask到底能登录多少用户?
  • 原文地址:https://www.cnblogs.com/chickenwrap/p/10125569.html
Copyright © 2011-2022 走看看