布尔型索引
有一个用于存储数据的数组以及一个存储姓名的数组(含有重复项),利用numpy.random中的randn函数生成一些正态分布的随机数据
1 In [15]: names = np.array(['Bob','Will','Joe','Bob','Will','Joe','Bob'])
2
3 In [16]: data = np.random.randn(7,4)
4
5 In [17]: names
6 Out[17]:
7 array(['Bob', 'Will', 'Joe', 'Bob', 'Will', 'Joe', 'Bob'],
8 dtype='|S4')
9
10 In [18]: data
11 Out[18]:
12 array([[ 0.70228847, 2.15235924, -0.44546734, 0.11531414],
13 [ 0.09055973, 0.34700575, -0.21555319, 0.61604966],
14 [-1.82610739, -0.06912911, -0.02635386, -0.39026196],
15 [-0.64379412, -0.0173949 , 0.79323255, 1.51808104],
16 [ 0.09152407, 3.40042424, 0.93578726, 0.02730237],
17 [ 0.21693798, 1.29032108, -0.86582956, 0.09536743],
18 [ 1.58762538, 0.22749992, 0.30686374, -0.74349097]])
假设每个名字对应data数组中的一行,要选出对应于名字‘Bob’的所有行,跟算数运算一样也是矢量化。
1 In [24]: names == 'Bob'
2 Out[24]: array([ True, False, False, True, False, False, True], dtype=bool)
3 In [26]: data[names == 'Bob']
4 Out[26]:
5 array([[ 0.70228847, 2.15235924, -0.44546734, 0.11531414],
6 [-0.64379412, -0.0173949 , 0.79323255, 1.51808104],
7 [ 1.58762538, 0.22749992, 0.30686374, -0.74349097]])
布尔型数组的长度必须跟被索引的轴长度一致,还可以将布尔型数组与切片、整数混合使用。
1 In [27]: data[names == 'Bob',2:] 2 Out[27]: 3 array([[-0.44546734, 0.11531414], 4 [ 0.79323255, 1.51808104], 5 [ 0.30686374, -0.74349097]]) 6 7 In [28]: data[names == 'Bob',3] 8 Out[28]: array([ 0.11531414, 1.51808104, -0.74349097])
要选择出‘Bob’以外的其他值,可以使用不等于符号
1 In [29]: names != 'Bob' 2 Out[29]: array([False, True, True, False, True, True, False], dtype=bool)
选取名字中两个需要组合应用多个布尔条件,使用&、|、之类的布尔算数运算符即可。
1 In [32]: mask = (names == 'Bob') | (names == 'Will') 2 3 In [33]: mask 4 Out[33]: array([ True, True, False, True, True, False, True], dtype=bool) 5 6 In [34]: data[mask] 7 Out[34]: 8 array([[ 0.70228847, 2.15235924, -0.44546734, 0.11531414], 9 [ 0.09055973, 0.34700575, -0.21555319, 0.61604966], 10 [-0.64379412, -0.0173949 , 0.79323255, 1.51808104], 11 [ 0.09152407, 3.40042424, 0.93578726, 0.02730237], 12 [ 1.58762538, 0.22749992, 0.30686374, -0.74349097]])
将data中的所有负值都设置为0。
1 In [35]: data[data < 0] = 0 2 3 In [36]: data 4 Out[36]: 5 array([[ 0.70228847, 2.15235924, 0. , 0.11531414], 6 [ 0.09055973, 0.34700575, 0. , 0.61604966], 7 [ 0. , 0. , 0. , 0. ], 8 [ 0. , 0. , 0.79323255, 1.51808104], 9 [ 0.09152407, 3.40042424, 0.93578726, 0.02730237], 10 [ 0.21693798, 1.29032108, 0. , 0.09536743], 11 [ 1.58762538, 0.22749992, 0.30686374, 0. ]])