Series类实例的检索s[key]
当pd.Series的索引是数值型类型时, 我们不可以通过s1[-1]来检索其最后一行的值
正确的做法是: s1.iloc[-1] 或者 s1[len(s1) - 1] 或者 s1.values[-1]
python语言里的魔术方法之__getitem__使类能够具有索引键功能. 也就是说instance[key]
可以检索到key对应的元素的值. pandas的Series类就是_getitem__方法的集大成者. 它里面隐藏了
很多规则.
这里深挖一下它的源码, 当Series的实例s1的索引是整型数时, 如果用[-1]索引键来检索时会发生什么情况呢?
我们顺藤摸瓜来跑一下程序的脉络:
getitem()里调用了: ._get_value(-1)方法, 该方法调用了: .index.get_loc(-1)方法.
问题就出在这里了: .index._range.index(-1)
'-1' 这个索引键根本就不在s1的索引里. 因为我们的s1的索引是: range(1)
所以程序才会抛出异常: KeyError: -1
当pd.Series的索引是字符型时(比如s2实例), 我们可以用s2[-1]来检索其最后一行的值
结论: series[key]这种检索方法, 功能很强大, 但是使用时要注意其索引的类型, 避免掉到坑里. 或者用.iloc()的方法更加明确一些.
Signature: s1.__getitem__(key)
Source:
def __getitem__(self, key):
key = com.apply_if_callable(key, self)
if key is Ellipsis:
return self
key_is_scalar = is_scalar(key)
if isinstance(key, (list, tuple)):
key = unpack_1tuple(key)
if is_integer(key) and self.index._should_fallback_to_positional():
return self._values[key]
elif key_is_scalar:
return self._get_value(key)
if is_hashable(key):
# Otherwise index.get_value will raise InvalidIndexError
try:
# For labels that don't resolve as scalars like tuples and frozensets
result = self._get_value(key)
return result
except KeyError:
if isinstance(key, tuple) and isinstance(self.index, MultiIndex):
# We still have the corner case where a tuple is a key
# in the first level of our MultiIndex
return self._get_values_tuple(key)
if is_iterator(key):
key = list(key)
if com.is_bool_indexer(key):
key = check_bool_indexer(self.index, key)
key = np.asarray(key, dtype=bool)
return self._get_values(key)
return self._get_with(key)
File: d:anaconda3libsite-packagespandascoreseries.py
Type: method
Signature: s1._get_value(label, takeable:bool=False)
Source:
def _get_value(self, label, takeable: bool = False):
"""
Quickly retrieve single value at passed index label.
Parameters
----------
label : object
takeable : interpret the index as indexers, default False
Returns
-------
scalar value
"""
if takeable:
return self._values[label]
# Similar to Index.get_value, but we do not fall back to positional
loc = self.index.get_loc(label)
return self.index._get_values_for_loc(self, loc, label)
File: d:anaconda3libsite-packagespandascoreseries.py
Type: method
s1.index.get_loc??
Signature: s1.index.get_loc(key, method=None, tolerance=None)
Source:
@doc(Int64Index.get_loc)
def get_loc(self, key, method=None, tolerance=None):
if method is None and tolerance is None:
if is_integer(key) or (is_float(key) and key.is_integer()):
new_key = int(key)
try:
return self._range.index(new_key)
except ValueError as err:
raise KeyError(key) from err
raise KeyError(key)
return super().get_loc(key, method=method, tolerance=tolerance)
File: d:anaconda3libsite-packagespandascoreindexes
ange.py
Type: method
s1=pd.Series([111,222], range(2))
s2=pd.Series([111,222], list('ab'))
s1
Out[266]:
0 111
1 222
dtype: int64
s2
Out[267]:
a 111
b 222
dtype: int64
s2[-1]
Out[268]: 222
s1[-1]
Traceback (most recent call last):
File "<ipython-input-269-0123e3764900>", line 1, in <module>
s1[-1]
File "D:Anaconda3libsite-packagespandascoreseries.py", line 882, in __getitem__
return self._get_value(key)
File "D:Anaconda3libsite-packagespandascoreseries.py", line 989, in _get_value
loc = self.index.get_loc(label)
File "D:Anaconda3libsite-packagespandascoreindexes
ange.py", line 357, in get_loc
raise KeyError(key) from err
KeyError: -1
pd.DataFrame类实例的检索df[key]
df是一个2D的数据结构, 它有两个可以检索的键: 或者是列名的组合或者是行名的组合(sliceable对象).
它的检索规则更加隐藏和复杂. 总之: 提供了一种在行轴或者列轴上的切片操作.