zoukankan      html  css  js  c++  java
  • 数据聚合与分组运算学习笔记3

    4 透视表和交叉表

    4.1 透视表

    透视表(pivot table): 根据一个或多个键对数据进行聚合,并根据行和列上的分组键将数据分配到各个
    矩形区域中。

    • DataFrame表有一个pivot_table方法,还有一个顶级的pandas.pivot_table
    • pivot_table还有一个分项小计选项,margins
    In [32]: tips.pivot_table(index=['sex','smoker'])
    Out[32]:
                       size       tip   tip_pct  total_bill
    sex    smoker                                          
    Female No      2.592593  2.773519  0.156921   18.105185
           Yes     2.242424  2.931515  0.182150   17.977879
    Male   No      2.711340  3.113402  0.160669   19.791237
           Yes     2.500000  3.051167  0.152771   22.284500
    
    In [33]: tips.pivot_table(['tip_pct','size'],index=['sex','day'],columns='smoker
        ...: ')
    Out[33]:
                  tip_pct                size          
    smoker             No       Yes        No       Yes
    sex    day                                         
    Female Fri   0.165296  0.209129  2.500000  2.000000
           Sat   0.147993  0.163817  2.307692  2.200000
           Sun   0.165710  0.237075  3.071429  2.500000
           Thur  0.155971  0.163073  2.480000  2.428571
    Male   Fri   0.138005  0.144730  2.000000  2.125000
           Sat   0.162132  0.139067  2.656250  2.629630
           Sun   0.158291  0.173964  2.883721  2.600000
           Thur  0.165706  0.164417  2.500000  2.300000
    
    # 增加分类汇总,margins=True
    In [34]: tips.pivot_table(['tip_pct','size'],index=['sex','day'],columns='smoker
       ...: ',margins=True)
    Out[34]:
                 tip_pct                          size                    
    smoker             No       Yes       All        No       Yes       All
    sex    day                                                             
    Female Fri   0.165296  0.209129  0.199388  2.500000  2.000000  2.111111
          Sat   0.147993  0.163817  0.156470  2.307692  2.200000  2.250000
          Sun   0.165710  0.237075  0.181569  3.071429  2.500000  2.944444
          Thur  0.155971  0.163073  0.157525  2.480000  2.428571  2.468750
    Male   Fri   0.138005  0.144730  0.143385  2.000000  2.125000  2.100000
          Sat   0.162132  0.139067  0.151577  2.656250  2.629630  2.644068
          Sun   0.158291  0.173964  0.162344  2.883721  2.600000  2.810345
          Thur  0.165706  0.164417  0.165276  2.500000  2.300000  2.433333
    All          0.159328  0.163196  0.160803  2.668874  2.408602  2.569672
    
    # 可以使用aggfunc传入其他聚合函数
    In [37]: tips.pivot_table(['tip_pct','size'],index=['sex','smoker'],columns='day',aggfunc=len, margins=True)
    Out[37]:
                  tip_pct                           size                         
    day               Fri   Sat   Sun  Thur    All   Fri   Sat   Sun  Thur    All
    sex    smoker                                                                
    Female No         2.0  13.0  14.0  25.0   54.0   2.0  13.0  14.0  25.0   54.0
           Yes        7.0  15.0   4.0   7.0   33.0   7.0  15.0   4.0   7.0   33.0
    Male   No         2.0  32.0  43.0  20.0   97.0   2.0  32.0  43.0  20.0   97.0
           Yes        8.0  27.0  15.0  10.0   60.0   8.0  27.0  15.0  10.0   60.0
    All              19.0  87.0  76.0  62.0  244.0  19.0  87.0  76.0  62.0  244.0
    

    pivot_table参数:

    4.2 交叉表

    交叉表示一种计算分组频率的特殊透视表,

    In [39]: tips[:5]
    Out[39]:
       total_bill   tip     sex smoker  day    time  size   tip_pct
    0       16.99  1.01  Female     No  Sun  Dinner     2  0.059447
    1       10.34  1.66    Male     No  Sun  Dinner     3  0.160542
    2       21.01  3.50    Male     No  Sun  Dinner     3  0.166587
    3       23.68  3.31    Male     No  Sun  Dinner     2  0.139780
    4       24.59  3.61  Female     No  Sun  Dinner     4  0.146808
    
    In [40]:
    
    In [40]: pd.crosstab([tips.time, tips.day],tips.smoker, margins=True)
    Out[40]:
    smoker        No  Yes  All
    time   day                
    Dinner Fri     3    9   12
           Sat    45   42   87
           Sun    57   19   76
           Thur    1    0    1
    Lunch  Fri     1    6    7
           Thur   44   17   61
    All          151   93  244
    
  • 相关阅读:
    IO和序列化
    委托与事件
    [基础不过关填坑] 跨iframe触发事件
    有哪些新手程序员不知道的小技巧?
    给echarts加个“全屏展示”
    bootstrapTable使用场景及方式
    bootstrap datetimepicker 格式化yyyymmdd时,无法读取yyyymmdd格式
    【变态需求】bootstrapTable列排序-选择正序倒序不排序
    myeclipse、maven、tomcat、jdk技巧和坑【待完善】
    【调试技巧】 Fiddler高级用法之url映射请求
  • 原文地址:https://www.cnblogs.com/felo/p/6411513.html
Copyright © 2011-2022 走看看