pandas 还有一个重要的功能,就是他可以对不同索引的对象进行算数运算。
对象相加, 如果存在不同的索引对,则结果的索引就是该索引对的并集。
先来个例子
Series
In [33]: s1 = Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e']) In [34]: s2 = Series([-2.1, 3.6, -1.5, 4, 3.1], index=['a', 'c', 'e', 'f', 'g']) In [35]: s1 Out[35]: a 7.3 c -2.5 d 3.4 e 1.5 dtype: float64 In [36]: s2 Out[36]: a -2.1 c 3.6 e -1.5 f 4.0 g 3.1 dtype: float64 In [37]: s1 + s2 Out[37]: a 5.2 c 1.1 d NaN e 0.0 f NaN g NaN dtype: float64
In [38]: s3 = Series([-2.1, 3.6, -1.5, 4, 3.1], index=['a', 'c', 'e', 'f', 'g']) In [39]: s1 + s2 + s3 Out[39]: a 3.1 c 4.7 d NaN e -1.5 f NaN g NaN dtype: float64 也就是说NaN值不会变
DataFrame
add 用于加法(+)方法 sub 用于减法(-)方法 div 用于除法(/)方法 mul 用于乘法(*)方法
In [45]: df1 = DataFrame(np.arange(9.).reshape((3,3)), columns=list('bcd'), index=['Ohio', "Texas", "Colorado"]) In [46]: df2 = DataFrame(np.arange(12.).reshape((4,3)), columns=list('bde'), index=["Uhah", 'Ohio', "Texas", "Oregon"])
In [47]: df1 + df2 Out[47]: b c d e Colorado NaN NaN NaN NaN Ohio 3.0 NaN 6.0 NaN Oregon NaN NaN NaN NaN Texas 9.0 NaN 12.0 NaN Uhah NaN NaN NaN NaN 那么可以使用add方法,传入df2一个fill_valued参数 In [8]: df1.add(df2, fill_value=0) Out[8]: b c d e Colorado 6.0 7.0 8.0 NaN Ohio 3.0 1.0 6.0 5.0 Oregon 9.0 NaN 10.0 11.0 Texas 9.0 4.0 12.0 8.0 Uhah 0.0 NaN 1.0 2.0
DataFrame和Series之间的运算
Series
In [40]: arr = np.arange(12.).reshape((3, 4)) In [41]: arr Out[41]: array([[ 0., 1., 2., 3.], [ 4., 5., 6., 7.], [ 8., 9., 10., 11.]]) In [42]: arr[0] Out[42]: array([ 0., 1., 2., 3.]) In [43]: arr - arr[0] Out[43]: array([[ 0., 0., 0., 0.], [ 4., 4., 4., 4.], [ 8., 8., 8., 8.]])
DataFrame
In [44]: frame = DataFrame(np.arange(12.).reshape((4,3)), columns=list('bde'), index=["Uhah", 'Ohio', "Texas", "Oregon"]) In [45]: series = frame.ix[0] In [46]: frame - series Out[46]: b d e Uhah 0.0 0.0 0.0 Ohio 3.0 3.0 3.0 Texas 6.0 6.0 6.0 Oregon 9.0 9.0 9.0
注意:如果某个索引值在DataFrame的列或Series的索引中找不到, 则参与运算的两个对象就会被重新索引以形成并集
In [47]: series2 = Series(range(3), index=['b', 'e', 'f']) In [48]: frame + series2 Out[48]: b d e f Uhah 0.0 NaN 3.0 NaN Ohio 3.0 NaN 6.0 NaN Texas 6.0 NaN 9.0 NaN Oregon 9.0 NaN 12.0 NaN
如果希望列在行上广播,必须使用算术运算方法
In [63]: frame.sub(series, axis=0) Out[63]: b d e Uhah -1.0 0.0 1.0 Ohio -1.0 0.0 1.0 Texas -1.0 0.0 1.0 Oregon -1.0 0.0 1.0