这篇介绍下有index索引的pandas Series是如何进行向量化运算的:
1. index索引数组相同:
s1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']) s2 = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd']) print s1 + s2 a 11 b 22 c 33 d 44 dtype: int64
直接把各个索引对应的值进行相加
2. index索引数组值相同,顺序不同:
s1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']) s2 = pd.Series([10, 20, 30, 40], index=['b', 'd', 'a', 'c']) print s1 + s2 a 31 b 12 c 43 d 24 dtype: int64
把各个索引对应的值相加,顺序以第一个Series的为准
3. index索引数组某些值相同,某些值不相同:
s1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']) s2 = pd.Series([10, 20, 30, 40], index=['c', 'd', 'e', 'f']) print s1 + s2 a NaN b NaN c 13.0 d 24.0 e NaN f NaN
相同索引值对应的值相加,不相同的因为找不到,所以返回NaN
4. index索引数组完全不同:
s1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']) s2 = pd.Series([10, 20, 30, 40], index=['e', 'f', 'g', 'h']) print s1 + s2 a NaN b NaN c NaN d NaN e NaN f NaN g NaN h NaN dtype: float64
因为没有相同的索引,所以无法对Series进行相加,得到的都是NaN