zoukankan      html  css  js  c++  java
  • 3-8 pivot操作

     

    数据透视表

    In [1]:
    import pandas as pd
    excelample=pd.DataFrame({'Month':["January","January","January","January",
                                    "February", "February","February","February",
                                     "March","March","March","March"],
                            'Category':["Transportation","Grocery","Household","Entertainment",
                                       "Transportation","Grocery","Household","Entertainment",
                                       "Transportation","Grocery","Household","Entertainment"],
                              'Amount':[74.,235.,175.,100.,115.,240.,225.,125.,390.,260.,200.,120.]})
    
    In [2]:
    excelample
    
    Out[2]:
     
     MonthCategoryAmount
    0 January Transportation 74.0
    1 January Grocery 235.0
    2 January Household 175.0
    3 January Entertainment 100.0
    4 February Transportation 115.0
    5 February Grocery 240.0
    6 February Household 225.0
    7 February Entertainment 125.0
    8 March Transportation 390.0
    9 March Grocery 260.0
    10 March Household 200.0
    11 March Entertainment 120.0
     

    1.统计指标:每个月的各个种类的花费:pivot

    In [3]:
    example_pivot=excelample.pivot(index='Category',columns='Month',values='Amount')
    example_pivot
    
    Out[3]:
     
    MonthFebruaryJanuaryMarch
    Category   
    Entertainment 125.0 100.0 120.0
    Grocery 240.0 235.0 260.0
    Household 225.0 175.0 200.0
    Transportation 115.0 74.0 390.0
    In [4]:
    example_pivot.sum(axis=1)#计算每个种类的总和
    
    Out[4]:
    Category
    Entertainment     345.0
    Grocery           735.0
    Household         600.0
    Transportation    579.0
    dtype: float64
    In [5]:
    example_pivot.sum(axis=0)#每个月的总和
    
    Out[5]:
    Month
    February    705.0
    January     584.0
    March       970.0
    dtype: float64
    In [6]:
    df=pd.read_csv('./Titanic_Data-master/Titanic_Data-master/train.csv')
    df.head()#读取前几行数据
    
    Out[6]:
     
     PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
    0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
    1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
    2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
    3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
    4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
     

    2.通过性别索引,船舱的等级分类,统计不同性别在不同船舱的费用:pivot_table(默认求平均值)

    In [8]:
    df.pivot_table(index='Sex',columns='Pclass',values='Fare')#默认求平均值
    
    Out[8]:
     
    Pclass123
    Sex   
    female 106.125798 21.970121 16.118810
    male 67.226127 19.741782 12.661633
    In [9]:
    df.pivot_table(index='Sex',columns='Pclass',values='Fare',aggfunc='max')#求最大
    
    Out[9]:
     
    Pclass123
    Sex   
    female 512.3292 65.0 69.55
    male 512.3292 73.5 69.55
    In [12]:
    df.pivot_table(index='Sex',columns='Pclass',values='Fare',aggfunc='count')#求计数
    
    Out[12]:
     
    Pclass123
    Sex   
    female 94 76 144
    male 122 108 347
    In [13]:
    pd.crosstab(index=df['Sex'],columns=df['Pclass'])#pd.crosstab和df.pivot_table的count是一样的效果
    
    Out[13]:
     
    Pclass123
    Sex   
    female 94 76 144
    male 122 108 347
     

    3.求不同等级的舱位,不同性别的获救概率

    In [14]:
    df.pivot_table(index='Pclass',columns='Sex',values='Survived',aggfunc='mean')#求平均值的概率
    
    Out[14]:
     
    Sexfemalemale
    Pclass  
    1 0.968085 0.368852
    2 0.921053 0.157407
    3 0.500000 0.135447
     

    4.新加一列,计算未成年的,不同性别的获救情况概率

    In [15]:
    df['Underaged']=df['Age']<=18#新加一列
    df.pivot_table(index='Underaged',columns='Sex',values='Survived',aggfunc='mean')#求平均值的概率
    
    Out[15]:
     
    Sexfemalemale
    Underaged  
    False 0.760163 0.167984
    True 0.676471 0.338028
  • 相关阅读:
    设计模式学习--面向对象的5条设计原则之单一职责原则--SRP
    设计模式学习--面向对象的5条设计原则(转)
    oracle 存储过程创建报错 Procedure created with compilation errors
    查看临时表空间占用最多的用户与SQL
    查看表空间文件以及利用率、修改、删除表空间文件大小
    aliyun阿里云alibabaMaven仓库地址——加速你的maven构建
    快速配置java环境变量
    oracle 月份中日的值必须介于 1 和当月最后一日之间
    Oracle 修改dmp的表空间
    oracle 空表导出dmp会报错
  • 原文地址:https://www.cnblogs.com/AI-robort/p/11636781.html
Copyright © 2011-2022 走看看