zoukankan      html  css  js  c++  java
  • pandas 的拼接merge和concat函数小结

    pandas中数据的合并方案主要有concat,merge,join等函数。

    • 其中concat主要是根据索引进行行或列的拼接,只能取行或列的交集或并集。
    • merge主要是根据共同列或者索引进行合并,可以取内连接,左连接、右连接、外连接等。
    • join的功能跟merge类似,因此不再赘述。
    import pandas as pd
    from pandas import Series,DataFrame
    # 定义一个函数,根据行和列名对元素设置值
    def make_df(cols,inds):
        data = {c:[c+str(i) for i in inds] for c in cols}
        return DataFrame(data,index=inds)
    
    df1 = make_df(list("abc"),[1,2,4])
    df1
    
    a b c
    1 a1 b1 c1
    2 a2 b2 c2
    4 a4 b4 c4
    df2 = make_df(list("abcd"),[2,4,6])
    df2
    
    a b c d
    2 a2 b2 c2 d2
    4 a4 b4 c4 d4
    6 a6 b6 c6 d6
    df11=df1.set_index('a')
    df22=df2.set_index('a')
    

    1. concat函数

    • axis :默认为0,为按行拼接;1 为按列拼接
    • ignore_index: 默认为False,会根据索引进行拼接;True 则会忽略原有索引,重建新索引
    • join: 为拼接方式,包括 inner,outer
    • sort: True 表示按索引排序

    (1) 简单的按索引的行列拼接

    # 按行拼接
    pd.concat([df1,df2],sort=False)
    
    a b c d
    1 a1 b1 c1 NaN
    2 a2 b2 c2 NaN
    4 a4 b4 c4 NaN
    2 a2 b2 c2 d2
    5 a5 b5 c5 d5
    6 a6 b6 c6 d6
    # 按列拼接
    pd.concat([df1,df2],axis=1)
    
    a b c a b c d
    1 a1 b1 c1 NaN NaN NaN NaN
    2 a2 b2 c2 a2 b2 c2 d2
    4 a4 b4 c4 NaN NaN NaN NaN
    5 NaN NaN NaN a5 b5 c5 d5
    6 NaN NaN NaN a6 b6 c6 d6

    (2)去掉原索引的拼接

    # 按行拼接,去掉原来的行索引重新索引
    pd.concat([df1,df2],sort=False,ignore_index=True)
    
    a b c d
    0 a1 b1 c1 NaN
    1 a2 b2 c2 NaN
    2 a4 b4 c4 NaN
    3 a2 b2 c2 d2
    4 a5 b5 c5 d5
    5 a6 b6 c6 d6
    # 按列拼接,去掉原来的列索引重新索引
    pd.concat([df1,df2],axis=1,ignore_index=True)
    
    0 1 2 3 4 5 6
    1 a1 b1 c1 NaN NaN NaN NaN
    2 a2 b2 c2 a2 b2 c2 d2
    4 a4 b4 c4 NaN NaN NaN NaN
    5 NaN NaN NaN a5 b5 c5 d5
    6 NaN NaN NaN a6 b6 c6 d6

    (3)指定连接方式的拼接

    • 拼接方式有 inner,outer
    # 交集,inner join
    pd.concat([df1,df2],sort=False,join='inner')
    
    a b c
    1 a1 b1 c1
    2 a2 b2 c2
    4 a4 b4 c4
    2 a2 b2 c2
    5 a5 b5 c5
    6 a6 b6 c6
    # 并集,outer join
    pd.concat([df1,df2],sort=False,join='outer')
    
    a b c d
    1 a1 b1 c1 NaN
    2 a2 b2 c2 NaN
    4 a4 b4 c4 NaN
    2 a2 b2 c2 d2
    5 a5 b5 c5 d5
    6 a6 b6 c6 d6

    2.merge函数

    • how:数据合并的方式。left:基于左dataframe列的数据合并;right:基于右dataframe列的数据合并;outer:基于列的数据外合并(取并集);inner:基于列的数据内合并(取交集);默认为'inner'。
    • on:基于相同列的合并
    • left_on/right_on:左/右dataframe合并的列名。
    • left_index/right_index:是否以index作为数据合并的列名,True表示是。可与left_on/right_on合并使用
    • sort:根据dataframe合并的keys排序,默认是。
    • suffixes:若有相同列且该列没有作为合并的列,可通过suffixes设置该列的后缀名,一般为元组和列表类型。

    (1) 基于相同列的合并

    df3 = pd.merge(df1,df2,how='inner',on='a')        # 基于单列的合并
    df4 = pd.merge(df1,df2,how='inner',on=['a','b'])  # 基于多列的合并
    df5 = pd.merge(df1,df2,how='left',on='a',suffixes=['_1','_2']) # 左连接,且指定后缀
    df5
    
    a b_1 c_1 b_2 c_2 d
    0 a1 b1 c1 NaN NaN NaN
    1 a2 b2 c2 b2 c2 d2
    2 a4 b4 c4 b4 c4 d4

    (2) 基于不同列名,或者列和索引,或者索引和索引间的合并

    df6 = pd.merge(df1,df2,how='inner',left_on='a',right_on='b')             # 基于不同列名
    df7 = pd.merge(df1,df22,how='inner',left_on='a',right_index=True)        #基于列和索引
    df8 = pd.merge(df1,df2,how='inner',left_index=True,right_index=True)    #基于两边都是索引
    df8
    
    a_x b_x c_x a_y b_y c_y d
    2 a2 b2 c2 a2 b2 c2 d2
    4 a4 b4 c4 a4 b4 c4 d4
  • 相关阅读:
    Unable To Open Database After ASM Upgrade From Release 11.1 To Release 11.2
    11g Understanding Automatic Diagnostic Repository.
    How to perform Rolling UpgradeDowngrade in 11g ASM
    Oracle 11.2.0.2 Patch 说明
    Pattern Matching Metacharacters For asm_diskstring
    Steps To MigrateMove a Database From NonASM to ASM And ViceVersa
    Upgrading ASM instance from Oracle 10.1 to Oracle 10.2. (Single Instance)
    OCSSD.BIN Process is Running in a NonRAC Environment
    Steps To MigrateMove a Database From NonASM to ASM And ViceVersa
    On RAC, expdp Removes the Service Name [ID 1269319.1]
  • 原文地址:https://www.cnblogs.com/laiyaling/p/11798046.html
Copyright © 2011-2022 走看看