zoukankan      html  css  js  c++  java
  • pandas dataframe, pandas series里的索引操作里的坑

    Series类实例的检索s[key]

    当pd.Series的索引是数值型类型时, 我们不可以通过s1[-1]来检索其最后一行的值

    正确的做法是: s1.iloc[-1] 或者 s1[len(s1) - 1] 或者 s1.values[-1]

    python语言里的魔术方法之__getitem__使类能够具有索引键功能. 也就是说instance[key]
    可以检索到key对应的元素的值. pandas的Series类就是_getitem__方法的集大成者. 它里面隐藏了
    很多规则.
    这里深挖一下它的源码, 当Series的实例s1的索引是整型数时, 如果用[-1]索引键来检索时会发生什么情况呢?
    我们顺藤摸瓜来跑一下程序的脉络:
    getitem()里调用了: ._get_value(-1)方法, 该方法调用了: .index.get_loc(-1)方法.
    问题就出在这里了: .index._range.index(-1)
    '-1' 这个索引键根本就不在s1的索引里. 因为我们的s1的索引是: range(1)
    所以程序才会抛出异常: KeyError: -1

    当pd.Series的索引是字符型时(比如s2实例), 我们可以用s2[-1]来检索其最后一行的值

    结论: series[key]这种检索方法, 功能很强大, 但是使用时要注意其索引的类型, 避免掉到坑里. 或者用.iloc()的方法更加明确一些.

    Signature: s1.__getitem__(key)
    Source:   
        def __getitem__(self, key):
            key = com.apply_if_callable(key, self)
    
            if key is Ellipsis:
                return self
    
            key_is_scalar = is_scalar(key)
            if isinstance(key, (list, tuple)):
                key = unpack_1tuple(key)
    
            if is_integer(key) and self.index._should_fallback_to_positional():
                return self._values[key]
    
            elif key_is_scalar:
                return self._get_value(key)
    
            if is_hashable(key):
                # Otherwise index.get_value will raise InvalidIndexError
                try:
                    # For labels that don't resolve as scalars like tuples and frozensets
                    result = self._get_value(key)
    
                    return result
    
                except KeyError:
                    if isinstance(key, tuple) and isinstance(self.index, MultiIndex):
                        # We still have the corner case where a tuple is a key
                        # in the first level of our MultiIndex
                        return self._get_values_tuple(key)
    
            if is_iterator(key):
                key = list(key)
    
            if com.is_bool_indexer(key):
                key = check_bool_indexer(self.index, key)
                key = np.asarray(key, dtype=bool)
                return self._get_values(key)
    
            return self._get_with(key)
    File:      d:anaconda3libsite-packagespandascoreseries.py
    Type:      method
    
    
    
    Signature: s1._get_value(label, takeable:bool=False)
    Source:   
        def _get_value(self, label, takeable: bool = False):
            """
            Quickly retrieve single value at passed index label.
    
            Parameters
            ----------
            label : object
            takeable : interpret the index as indexers, default False
    
            Returns
            -------
            scalar value
            """
            if takeable:
                return self._values[label]
    
            # Similar to Index.get_value, but we do not fall back to positional
            loc = self.index.get_loc(label)
            return self.index._get_values_for_loc(self, loc, label)
    File:      d:anaconda3libsite-packagespandascoreseries.py
    Type:      method
    
    
    
    s1.index.get_loc??
    Signature: s1.index.get_loc(key, method=None, tolerance=None)
    Source:   
        @doc(Int64Index.get_loc)
        def get_loc(self, key, method=None, tolerance=None):
            if method is None and tolerance is None:
                if is_integer(key) or (is_float(key) and key.is_integer()):
                    new_key = int(key)
                    try:
                        return self._range.index(new_key)
                    except ValueError as err:
                        raise KeyError(key) from err
                raise KeyError(key)
            return super().get_loc(key, method=method, tolerance=tolerance)
    File:      d:anaconda3libsite-packagespandascoreindexes
    ange.py
    Type:      method
    
    
    
    
    s1=pd.Series([111,222], range(2))
    s2=pd.Series([111,222], list('ab'))
    
    
    s1
    Out[266]: 
    0    111
    1    222
    dtype: int64
    
    s2
    Out[267]: 
    a    111
    b    222
    dtype: int64
    
    s2[-1]
    Out[268]: 222
    s1[-1]
    
    Traceback (most recent call last):
    
      File "<ipython-input-269-0123e3764900>", line 1, in <module>
        s1[-1]
    
      File "D:Anaconda3libsite-packagespandascoreseries.py", line 882, in __getitem__
        return self._get_value(key)
    
      File "D:Anaconda3libsite-packagespandascoreseries.py", line 989, in _get_value
        loc = self.index.get_loc(label)
    
      File "D:Anaconda3libsite-packagespandascoreindexes
    ange.py", line 357, in get_loc
        raise KeyError(key) from err
    
    KeyError: -1
    
    

    pd.DataFrame类实例的检索df[key]

    df是一个2D的数据结构, 它有两个可以检索的键: 或者是列名的组合或者是行名的组合(sliceable对象).
    它的检索规则更加隐藏和复杂. 总之: 提供了一种在行轴或者列轴上的切片操作.

  • 相关阅读:
    PHP实现无限极分类
    html2canvas生成并下载图片
    一次线上问题引发的过程回顾和思考,以更换两台服务器结束
    Intellij IDEA启动项目报Command line is too long. Shorten command line for XXXApplication or also for
    mq 消费消息 与发送消息传参问题
    idea 创建不了 java 文件
    Java switch 中如何使用枚举?
    Collections排序
    在idea 设置 git 的用户名
    mongodb添加字段和创建自增主键
  • 原文地址:https://www.cnblogs.com/duan-qs/p/13906059.html
Copyright © 2011-2022 走看看