zoukankan      html  css  js  c++  java
  • pandas dataframe, pandas series里的索引操作里的坑

    Series类实例的检索s[key]

    当pd.Series的索引是数值型类型时, 我们不可以通过s1[-1]来检索其最后一行的值

    正确的做法是: s1.iloc[-1] 或者 s1[len(s1) - 1] 或者 s1.values[-1]

    python语言里的魔术方法之__getitem__使类能够具有索引键功能. 也就是说instance[key]
    可以检索到key对应的元素的值. pandas的Series类就是_getitem__方法的集大成者. 它里面隐藏了
    很多规则.
    这里深挖一下它的源码, 当Series的实例s1的索引是整型数时, 如果用[-1]索引键来检索时会发生什么情况呢?
    我们顺藤摸瓜来跑一下程序的脉络:
    getitem()里调用了: ._get_value(-1)方法, 该方法调用了: .index.get_loc(-1)方法.
    问题就出在这里了: .index._range.index(-1)
    '-1' 这个索引键根本就不在s1的索引里. 因为我们的s1的索引是: range(1)
    所以程序才会抛出异常: KeyError: -1

    当pd.Series的索引是字符型时(比如s2实例), 我们可以用s2[-1]来检索其最后一行的值

    结论: series[key]这种检索方法, 功能很强大, 但是使用时要注意其索引的类型, 避免掉到坑里. 或者用.iloc()的方法更加明确一些.

    Signature: s1.__getitem__(key)
    Source:   
        def __getitem__(self, key):
            key = com.apply_if_callable(key, self)
    
            if key is Ellipsis:
                return self
    
            key_is_scalar = is_scalar(key)
            if isinstance(key, (list, tuple)):
                key = unpack_1tuple(key)
    
            if is_integer(key) and self.index._should_fallback_to_positional():
                return self._values[key]
    
            elif key_is_scalar:
                return self._get_value(key)
    
            if is_hashable(key):
                # Otherwise index.get_value will raise InvalidIndexError
                try:
                    # For labels that don't resolve as scalars like tuples and frozensets
                    result = self._get_value(key)
    
                    return result
    
                except KeyError:
                    if isinstance(key, tuple) and isinstance(self.index, MultiIndex):
                        # We still have the corner case where a tuple is a key
                        # in the first level of our MultiIndex
                        return self._get_values_tuple(key)
    
            if is_iterator(key):
                key = list(key)
    
            if com.is_bool_indexer(key):
                key = check_bool_indexer(self.index, key)
                key = np.asarray(key, dtype=bool)
                return self._get_values(key)
    
            return self._get_with(key)
    File:      d:anaconda3libsite-packagespandascoreseries.py
    Type:      method
    
    
    
    Signature: s1._get_value(label, takeable:bool=False)
    Source:   
        def _get_value(self, label, takeable: bool = False):
            """
            Quickly retrieve single value at passed index label.
    
            Parameters
            ----------
            label : object
            takeable : interpret the index as indexers, default False
    
            Returns
            -------
            scalar value
            """
            if takeable:
                return self._values[label]
    
            # Similar to Index.get_value, but we do not fall back to positional
            loc = self.index.get_loc(label)
            return self.index._get_values_for_loc(self, loc, label)
    File:      d:anaconda3libsite-packagespandascoreseries.py
    Type:      method
    
    
    
    s1.index.get_loc??
    Signature: s1.index.get_loc(key, method=None, tolerance=None)
    Source:   
        @doc(Int64Index.get_loc)
        def get_loc(self, key, method=None, tolerance=None):
            if method is None and tolerance is None:
                if is_integer(key) or (is_float(key) and key.is_integer()):
                    new_key = int(key)
                    try:
                        return self._range.index(new_key)
                    except ValueError as err:
                        raise KeyError(key) from err
                raise KeyError(key)
            return super().get_loc(key, method=method, tolerance=tolerance)
    File:      d:anaconda3libsite-packagespandascoreindexes
    ange.py
    Type:      method
    
    
    
    
    s1=pd.Series([111,222], range(2))
    s2=pd.Series([111,222], list('ab'))
    
    
    s1
    Out[266]: 
    0    111
    1    222
    dtype: int64
    
    s2
    Out[267]: 
    a    111
    b    222
    dtype: int64
    
    s2[-1]
    Out[268]: 222
    s1[-1]
    
    Traceback (most recent call last):
    
      File "<ipython-input-269-0123e3764900>", line 1, in <module>
        s1[-1]
    
      File "D:Anaconda3libsite-packagespandascoreseries.py", line 882, in __getitem__
        return self._get_value(key)
    
      File "D:Anaconda3libsite-packagespandascoreseries.py", line 989, in _get_value
        loc = self.index.get_loc(label)
    
      File "D:Anaconda3libsite-packagespandascoreindexes
    ange.py", line 357, in get_loc
        raise KeyError(key) from err
    
    KeyError: -1
    
    

    pd.DataFrame类实例的检索df[key]

    df是一个2D的数据结构, 它有两个可以检索的键: 或者是列名的组合或者是行名的组合(sliceable对象).
    它的检索规则更加隐藏和复杂. 总之: 提供了一种在行轴或者列轴上的切片操作.

  • 相关阅读:
    【Log】【Log4j】【1】log4j日志的输出级别
    【Word&Excel】【1】更新Word的目录
    【服务器】【Windows】【5】让bat执行完后不关闭
    【Mybatis】【5】Oralce in 语句中当in(1,2,3...) 条件数量大于1000将会报错
    【JS插件】【1】移动端(微信等)使用 vConsole调试console
    【Oracle】【10】去除数据中的html标签
    【其他】【前端安全】【1】XSS攻击
    hdu 4433
    hdu 4435
    hdu 4752
  • 原文地址:https://www.cnblogs.com/duan-qs/p/13906059.html
Copyright © 2011-2022 走看看