zoukankan      html  css  js  c++  java
  • 数据聚合与分组运算学习笔记3

    4 透视表和交叉表

    4.1 透视表

    透视表(pivot table): 根据一个或多个键对数据进行聚合,并根据行和列上的分组键将数据分配到各个
    矩形区域中。

    • DataFrame表有一个pivot_table方法,还有一个顶级的pandas.pivot_table
    • pivot_table还有一个分项小计选项,margins
    In [32]: tips.pivot_table(index=['sex','smoker'])
    Out[32]:
                       size       tip   tip_pct  total_bill
    sex    smoker                                          
    Female No      2.592593  2.773519  0.156921   18.105185
           Yes     2.242424  2.931515  0.182150   17.977879
    Male   No      2.711340  3.113402  0.160669   19.791237
           Yes     2.500000  3.051167  0.152771   22.284500
    
    In [33]: tips.pivot_table(['tip_pct','size'],index=['sex','day'],columns='smoker
        ...: ')
    Out[33]:
                  tip_pct                size          
    smoker             No       Yes        No       Yes
    sex    day                                         
    Female Fri   0.165296  0.209129  2.500000  2.000000
           Sat   0.147993  0.163817  2.307692  2.200000
           Sun   0.165710  0.237075  3.071429  2.500000
           Thur  0.155971  0.163073  2.480000  2.428571
    Male   Fri   0.138005  0.144730  2.000000  2.125000
           Sat   0.162132  0.139067  2.656250  2.629630
           Sun   0.158291  0.173964  2.883721  2.600000
           Thur  0.165706  0.164417  2.500000  2.300000
    
    # 增加分类汇总,margins=True
    In [34]: tips.pivot_table(['tip_pct','size'],index=['sex','day'],columns='smoker
       ...: ',margins=True)
    Out[34]:
                 tip_pct                          size                    
    smoker             No       Yes       All        No       Yes       All
    sex    day                                                             
    Female Fri   0.165296  0.209129  0.199388  2.500000  2.000000  2.111111
          Sat   0.147993  0.163817  0.156470  2.307692  2.200000  2.250000
          Sun   0.165710  0.237075  0.181569  3.071429  2.500000  2.944444
          Thur  0.155971  0.163073  0.157525  2.480000  2.428571  2.468750
    Male   Fri   0.138005  0.144730  0.143385  2.000000  2.125000  2.100000
          Sat   0.162132  0.139067  0.151577  2.656250  2.629630  2.644068
          Sun   0.158291  0.173964  0.162344  2.883721  2.600000  2.810345
          Thur  0.165706  0.164417  0.165276  2.500000  2.300000  2.433333
    All          0.159328  0.163196  0.160803  2.668874  2.408602  2.569672
    
    # 可以使用aggfunc传入其他聚合函数
    In [37]: tips.pivot_table(['tip_pct','size'],index=['sex','smoker'],columns='day',aggfunc=len, margins=True)
    Out[37]:
                  tip_pct                           size                         
    day               Fri   Sat   Sun  Thur    All   Fri   Sat   Sun  Thur    All
    sex    smoker                                                                
    Female No         2.0  13.0  14.0  25.0   54.0   2.0  13.0  14.0  25.0   54.0
           Yes        7.0  15.0   4.0   7.0   33.0   7.0  15.0   4.0   7.0   33.0
    Male   No         2.0  32.0  43.0  20.0   97.0   2.0  32.0  43.0  20.0   97.0
           Yes        8.0  27.0  15.0  10.0   60.0   8.0  27.0  15.0  10.0   60.0
    All              19.0  87.0  76.0  62.0  244.0  19.0  87.0  76.0  62.0  244.0
    

    pivot_table参数:

    4.2 交叉表

    交叉表示一种计算分组频率的特殊透视表,

    In [39]: tips[:5]
    Out[39]:
       total_bill   tip     sex smoker  day    time  size   tip_pct
    0       16.99  1.01  Female     No  Sun  Dinner     2  0.059447
    1       10.34  1.66    Male     No  Sun  Dinner     3  0.160542
    2       21.01  3.50    Male     No  Sun  Dinner     3  0.166587
    3       23.68  3.31    Male     No  Sun  Dinner     2  0.139780
    4       24.59  3.61  Female     No  Sun  Dinner     4  0.146808
    
    In [40]:
    
    In [40]: pd.crosstab([tips.time, tips.day],tips.smoker, margins=True)
    Out[40]:
    smoker        No  Yes  All
    time   day                
    Dinner Fri     3    9   12
           Sat    45   42   87
           Sun    57   19   76
           Thur    1    0    1
    Lunch  Fri     1    6    7
           Thur   44   17   61
    All          151   93  244
    
  • 相关阅读:
    nginx安装
    Linux使用yum命令安装软件时,连接不了网路报错:ERROR 6
    scrapy_redis使用介绍
    redis异常信息:MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk.。。。。
    scrapy之自定制命令
    利用scrapy获取抽屉新热榜的标题和内容以及新闻地址保存到本地
    Python requests.post方法中data与json参数区别
    CentOS下Mysql简易操作
    CentOS下构建Shell简易分发系统
    CentOS下搭建测试WEB平台
  • 原文地址:https://www.cnblogs.com/felo/p/6411513.html
Copyright © 2011-2022 走看看