和numpy数组(5)-二维数组的轴一样,pandas DataFrame也有轴的概念,决定了方法是对行应用还是对列应用:
以下面这个数据为例说明:
这个数据是5个车站10天内的客流数据:
ridership_df = pd.DataFrame( data=[[ 0, 0, 2, 5, 0], [1478, 3877, 3674, 2328, 2539], [1613, 4088, 3991, 6461, 2691], [1560, 3392, 3826, 4787, 2613], [1608, 4802, 3932, 4477, 2705], [1576, 3933, 3909, 4979, 2685], [ 95, 229, 255, 496, 201], [ 2, 0, 1, 27, 0], [1438, 3785, 3589, 4174, 2215], [1342, 4043, 4009, 4665, 3033]], index=['05-01-11', '05-02-11', '05-03-11', '05-04-11', '05-05-11', '05-06-11', '05-07-11', '05-08-11', '05-09-11', '05-10-11'], columns=['R003', 'R004', 'R005', 'R006', 'R007'] )
R003 R004 R005 R006 R007 05-01-11 0 0 2 5 0 05-02-11 1478 3877 3674 2328 2539 05-03-11 1613 4088 3991 6461 2691 05-04-11 1560 3392 3826 4787 2613 05-05-11 1608 4802 3932 4477 2705 05-06-11 1576 3933 3909 4979 2685 05-07-11 95 229 255 496 201 05-08-11 2 0 1 27 0 05-09-11 1438 3785 3589 4174 2215 05-10-11 1342 4043 4009 4665 3033
这个数据里,行表示每一天里各个站的客流,列表示每一个站里各天的客流
如果要计算每天各个站的平均客流:
print(ridership_df.mean(axis=1))
or:
print(ridership_df.mean(axis='columns'))
05-01-11 1.4
05-02-11 2779.2
05-03-11 3768.8
05-04-11 3235.6
05-05-11 3504.8
05-06-11 3416.4
05-07-11 255.2
05-08-11 6.0
05-09-11 3040.2
05-10-11 3418.4
dtype: float64
如果要计算每个站各天的平均客流:
print(ridership_df.mean(axis=0)) or: print(ridership_df.mean(axis='index'))
R003 1071.2 R004 2814.9 R005 2718.8 R006 3239.9 R007 1868.2 dtype: float64
*总结:
axis=0或者axis='index',计算列
axis=1或者axis='columns',计算行