zoukankan      html  css  js  c++  java
  • 3.5.4 分组、合并、透视表

    1.导入三方库

    import numpy as np
    import pandas as pd
    df = pd.read_csv('table.csv',index_col='ID') #用来指定表格的索引值

    df.head(2)

     

     SchoolClassGenderAddressHeightWeightMathPhysics
    ID        
    1101 S_1 C_1 M street_1 173 63 34.0 A+
    1102 S_1 C_1 F street_2 192 73 32.5 B+

     

     

    2.分组

    单列分组

    danlie = df.groupby('School')
    danlie.sum()

     

     HeightWeightMath
    School   
    S_1 2636 1050 956.2
    S_2 3459 1563 1191.1

     

    多列分组

    duolie = df.groupby(['School','Class'])
    duolie.sum()

     

     
      HeightWeightMath
    SchoolClass   
    S_1C_1 877 363 318.9
    C_2 853 341 321.5
    C_3 906 346 315.8
    S_2C_1 821 384 292.8
    C_2 900 418 314.0
    C_3 869 419 315.3
    C_4 869 342 269.0

     

     

    提取指定字段并指定分组的列

    tiqu = df['Address Math'.split()].groupby(by='Address')
    tiqu.count()

     

     
     Math
    Address 
    street_1 3
    street_2 6
    street_4 8
    street_5 6
    street_6 6
    street_7 6

    </div>

     

    3.合并

    sheet = pd.ExcelFile('用户借款数据.xlsx')
    sheet.sheet_names

     

    ['基础信息', '就业信息', '借款信息']

     

    hb1 = pd.read_excel('用户借款数据.xlsx',sheet_name='基础信息')
    hb2 = pd.read_excel('用户借款数据.xlsx',sheet_name='就业信息')
    hb3 = pd.read_excel('用户借款数据.xlsx',sheet_name='借款信息')
    hb1.info()
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 999 entries, 0 to 998
    Data columns (total 8 columns):
     #   Column   Non-Null Count  Dtype  
    ---  ------   --------------  -----  
     0   编号       999 non-null    int64  
     1   城市       999 non-null    object 
     2   国家       999 non-null    object 
     3   出生日期     999 non-null    object 
     4   教育程度     999 non-null    int64  
     5   性别       999 non-null    int64  
     6   房屋所有权类型  763 non-null    float64
     7   婚姻状况     999 non-null    int64  
    dtypes: float64(1), int64(4), object(3)
    memory usage: 62.6+ KB

     

    hb1.head(6)

     

     
     编号城市国家出生日期教育程度性别房屋所有权类型婚姻状况
    0 0 AESPA EE 21/11/1975 4 0 6.0 4
    1 2 P脛RNU EE 25/10/1969 4 1 9.0 1
    2 3 TALLINN EE 22/11/1965 4 0 1.0 4
    3 4 KEHTNA EE 05/12/1983 2 0 1.0 3
    4 5 KIVI脮LI EE 13/07/1980 4 1 4.0 3
    5 6 P脛RNU EE 05/07/1993 3 1 2.0 3

    </div>

     

    hb2.info()
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 999 entries, 0 to 998
    Data columns (total 9 columns):
     #   Column     Non-Null Count  Dtype  
    ---  ------     --------------  -----  
     0   编号         999 non-null    int64  
     1   申请签字时间     999 non-null    int64  
     2   申请签证周      999 non-null    int64  
     3   当前雇主的雇佣期限  926 non-null    object 
     4   就业岗位       962 non-null    object 
     5   就业状况       977 non-null    float64
     6   主要工作收入     999 non-null    int64  
     7   收入总额       999 non-null    int64  
     8   工作经验       999 non-null    object 
    dtypes: float64(1), int64(5), object(3)
    memory usage: 70.4+ KB

     

    hb2.head(6)

     

     
    
    
     编号申请签字时间申请签证周当前雇主的雇佣期限就业岗位就业状况主要工作收入收入总额工作经验
    0 0 12 4 UpTo5Years Worker 3.0 1000 1000 15To25Years
    1 2 14 2 MoreThan5Years SpecialistOfficeWorker 3.0 633 633 15To25Years
    2 3 22 4 MoreThan5Years Owner 5.0 550 550 MoreThan25Years
    3 4 15 3 MoreThan5Years Worker 3.0 833 833 5To10Years
    4 5 10 3 MoreThan5Years Worker 3.0 341 341 5To10Years
    5 6 14 4 UpTo1Year Worker 3.0 430 430 LessThan2Years

     

     

    hb3.info()
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 899 entries, 0 to 898
    Data columns (total 12 columns):
     #   Column  Non-Null Count  Dtype  
    ---  ------  --------------  -----  
     0   编号      899 non-null    int64  
     1   状态      899 non-null    object 
     2   金额      899 non-null    float64
     3   现有负债    899 non-null    int64  
     4   贷款日期    899 non-null    object 
     5   贷款期限    899 non-null    int64  
     6   新信用客户   899 non-null    bool   
     7   贷前贷款数   899 non-null    int64  
     8   占用区     859 non-null    float64
     9   贷款用途    899 non-null    int64  
     10  先前得分    763 non-null    float64
     11  违约      899 non-null    bool   
    dtypes: bool(2), float64(3), int64(5), object(2)
    memory usage: 72.1+ KB

     

    hb3.head()

     

     
    
    
     编号状态金额现有负债贷款日期贷款期限新信用客户贷前贷款数占用区贷款用途先前得分违约
    0 0 Current 5000.0 9 05/05/2015 60 False 1 8.0 0 0.0957 False
    1 2 Repaid 530.0 7 19/10/2015 60 True 0 1.0 2 0.1034 False
    2 3 Current 5500.0 1 02/09/2015 60 True 0 13.0 2 0.0772 False
    3 4 Repaid 6900.0 10 20/05/2015 60 False 1 19.0 7 0.0773 False
    4 5 Current 2655.0 6 29/12/2015 60 True 0 17.0 2 0.1898 False

     

    data2 = pd.merge(hb1,hb2,how='outer')
    data2

     

     编号城市国家出生日期教育程度性别房屋所有权类型婚姻状况申请签字时间申请签证周当前雇主的雇佣期限就业岗位就业状况主要工作收入收入总额工作经验
    0 0 AESPA EE 21/11/1975 4 0 6.0 4 12 4 UpTo5Years Worker 3.0 1000 1000 15To25Years
    1 2 P脛RNU EE 25/10/1969 4 1 9.0 1 14 2 MoreThan5Years SpecialistOfficeWorker 3.0 633 633 15To25Years
    2 3 TALLINN EE 22/11/1965 4 0 1.0 4 22 4 MoreThan5Years Owner 5.0 550 550 MoreThan25Years
    3 4 KEHTNA EE 05/12/1983 2 0 1.0 3 15 3 MoreThan5Years Worker 3.0 833 833 5To10Years
    4 5 KIVI脮LI EE 13/07/1980 4 1 4.0 3 10 3 MoreThan5Years Worker 3.0 341 341 5To10Years
    ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    994 1257 P脛RISPEA EE 08/04/1971 3 0 NaN 3 19 5 UpTo2Years valvur 3.0 3820 8452 10To15Years
    995 1258 TALLINN EE 29/11/1949 4 1 NaN 5 13 6 UpTo4Years klienditeenindaja 3.0 9000 9000 MoreThan25Years
    996 1259 UUEM脮ISA EE 21/02/1979 4 1 NaN 1 17 1 UpTo1Year Noorem referent 3.0 7200 7500 10To15Years
    997 1260 TARTU EE 21/10/1983 5 1 NaN 3 16 7 NaN administraator 3.0 6500 10500 2To5Years
    998 1261 VILJANDI EE 30/08/1977 1 1 NaN 2 21 7 MoreThan5Years NaN 3.0 8000 8300 10To15Years

    999 rows × 16 columns

    </div>

     

    4.透视表

    shuju = pd.read_csv('table.csv')
    shuju.head()

     

     
     SchoolClassIDGenderAddressHeightWeightMathPhysics
    0 S_1 C_1 1101 M street_1 173 63 34.0 A+
    1 S_1 C_1 1102 F street_2 192 73 32.5 B+
    2 S_1 C_1 1103 M street_2 186 82 87.2 B+
    3 S_1 C_1 1104 F street_2 167 81 80.4 B-
    4 S_1 C_1 1105 F street_4 159 64 84.8 B+

    </div>

     

    pivot_table

    pd.pivot_table(shuju,index='ID',columns='Gender',values='Height').head()

     

     
    GenderFM
    ID  
    1101 NaN 173.0
    1102 192.0 NaN
    1103 NaN 186.0
    1104 167.0 NaN
    1105 159.0 NaN

     

     

    四个参数

    pd.pivot_table(shuju,index='Gender',columns='School',values='Height',aggfunc=['mean','sum'])

     

     
    
    
     meansum
    SchoolS_1S_2S_1S_2
    Gender    
    F 173.125000 173.727273 1385 1911
    M 178.714286 172.000000 1251 1548

     

    pd.pivot_table(shuju,index=['School','Class'],columns=['Gender','Address'],values=['Height','Weight'])

     

      Height...Weight
     GenderFM...FM
     Addressstreet_1street_2street_4street_5street_6street_7street_1street_2street_4street_5...street_4street_5street_6street_7street_1street_2street_4street_5street_6street_7
    SchoolClass                     
    S_1C_1 NaN 179.5 159.0 NaN NaN NaN 173.0 186.0 NaN NaN ... 64.0 NaN NaN NaN 63.0 82.0 NaN NaN NaN NaN
    C_2 NaN NaN 176.0 162.0 167.0 NaN NaN NaN NaN 188.0 ... 94.0 63.0 63.0 NaN NaN NaN NaN 68.0 53.0 NaN
    C_3 175.0 NaN NaN 187.0 NaN NaN NaN 195.0 161.0 NaN ... NaN 69.0 NaN NaN NaN 70.0 68.0 NaN NaN 82.0
    S_2C_1 NaN NaN NaN 159.0 161.0 NaN NaN NaN 163.5 NaN ... NaN 97.0 61.0 NaN NaN NaN 71.0 NaN NaN 84.0
    C_2 NaN NaN NaN NaN NaN 188.5 175.0 NaN 155.0 193.0 ... NaN NaN NaN 76.5 74.0 NaN 91.0 100.0 NaN NaN
    C_3 NaN NaN 157.0 NaN 164.0 190.0 NaN NaN 187.0 171.0 ... 78.0 NaN 81.0 99.0 NaN NaN 73.0 88.0 NaN NaN
    C_4 NaN 176.0 NaN NaN 175.5 NaN NaN NaN NaN NaN ... NaN NaN 57.0 NaN NaN NaN NaN NaN NaN 82.0

    7 rows × 24 columns

     

     

    小石小石摩西摩西的学习笔记,欢迎提问,欢迎指正!!!
  • 相关阅读:
    C#调用C++ ---参数传递
    Retained Mode Versus Immediate Mode
    mind map in latex
    vk example
    基本环境
    [转]ld 和 ld.gold 和 ld.bfd
    [转] c++11 int&& 右值引用
    [转] c++11列表初始化
    [转] c++ const, volatile, mutable用法
    [转] c++11 模板元编程
  • 原文地址:https://www.cnblogs.com/shijingwen/p/13700511.html
Copyright © 2011-2022 走看看