zoukankan      html  css  js  c++  java
  • loc、iloc、ix比较

    使用pandas创建一个对象

    In [1]: import pandas as pd
    
    In [2]: import numpy as np
    
    In [3]: df = pd.DataFrame(np.random.randn(6,4),index=pd.date_range('20180101',periods=6),columns=list('ABCD'))
    
    In [4]: df
    Out[4]:
                       A         B         C         D
    2018-01-01 -0.603510  0.269480  0.197354 -0.433003
    2018-01-02  1.230502  0.474616  1.473517 -0.627363
    2018-01-03 -0.402034  0.569097  0.675872 -0.317995
    2018-01-04  0.220638  0.527543 -1.140620 -0.348089
    2018-01-05 -2.494331  0.593269  0.596578  1.653347
    2018-01-06 -2.766239 -0.919777  0.462890  0.156048

    如果你想得到第三行的数据:

    如果你沿袭之前python切片的习惯,想直接取,那么需要改变一下方式。

    KeyError                                  Traceback (most recent call last)
    D:Anaconda3libsite-packagespandascoreindexesase.py in get_loc(self, key, method, tolerance)
       3062             try:
    -> 3063                 return self._engine.get_loc(key)
       3064             except KeyError:
    
    pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()
    
    pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()
    
    pandas\_libshashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
    
    pandas\_libshashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
    
    KeyError: 2
    
    During handling of the above exception, another exception occurred:
    
    KeyError                                  Traceback (most recent call last)
    <ipython-input-5-b5f2749c85df> in <module>()
    ----> 1 df[2]
    
    D:Anaconda3libsite-packagespandascoreframe.py in __getitem__(self, key)
       2683             return self._getitem_multilevel(key)
       2684         else:
    -> 2685             return self._getitem_column(key)
       2686
       2687     def _getitem_column(self, key):
    
    D:Anaconda3libsite-packagespandascoreframe.py in _getitem_column(self, key)
       2690         # get column
       2691         if self.columns.is_unique:
    -> 2692             return self._get_item_cache(key)
       2693
       2694         # duplicate columns & possible reduce dimensionality
    
    D:Anaconda3libsite-packagespandascoregeneric.py in _get_item_cache(self, item)
       2484         res = cache.get(item)
       2485         if res is None:
    -> 2486             values = self._data.get(item)
       2487             res = self._box_item_values(item, values)
       2488             cache[item] = res
    
    D:Anaconda3libsite-packagespandascoreinternals.py in get(self, item, fastpath)
       4113
       4114             if not isna(item):
    -> 4115                 loc = self.items.get_loc(item)
       4116             else:
       4117                 indexer = np.arange(len(self.items))[isna(self.items)]
    
    D:Anaconda3libsite-packagespandascoreindexesase.py in get_loc(self, key, method, tolerance)
       3063                 return self._engine.get_loc(key)
       3064             except KeyError:
    -> 3065                 return self._engine.get_loc(self._maybe_cast_indexer(key))
       3066
       3067         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
    
    pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()
    
    pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()
    
    pandas\_libshashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
    
    pandas\_libshashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
    
    KeyError: 2
    df[2]存在语法错误

    正确的做法其实有好多种:

    方法1:

    In [6]: df[2:3]
    Out[6]:
                       A         B         C         D
    2018-01-03 -0.402034  0.569097  0.675872 -0.317995

    方法2:

    vIn [7]: df['20180103':'20180103']  #这里必须使用这种方式,不然会有语法错误
    Out[7]:
                       A         B         C         D
    2018-01-03 -0.402034  0.569097  0.675872 -0.317995

    刚才使用类似python单个切片的方式貌似不行,所以就要说到今天的重点,loc、iloc、ix

    (1).loc:按照标签进行取值

    In [8]: df.loc['2018/01/03']
    Out[8]:
    A   -0.402034
    B    0.569097
    C    0.675872
    D   -0.317995
    Name: 2018-01-03 00:00:00, dtype: float64

    (2).iloc:按照标签进行取值

    In [9]: df.iloc[2]
    Out[9]:
    A   -0.402034
    B    0.569097
    C    0.675872
    D   -0.317995
    Name: 2018-01-03 00:00:00, dtype: float64

    (3)ix:混合缩影

    In [10]: df.ix['2018/01/03']
    Out[10]:
    A   -0.402034
    B    0.569097
    C    0.675872
    D   -0.317995
    Name: 2018-01-03 00:00:00, dtype: float64
    
    In [11]: df.ix[2]
    Out[11]:
    A   -0.402034
    B    0.569097
    C    0.675872
    D   -0.317995
    Name: 2018-01-03 00:00:00, dtype: float64
  • 相关阅读:
    自定义udf添加一列
    spark执行命令 监控执行命令
    R链接hive/oracle/mysql
    [Hive_6] Hive 的内置函数应用
    [Hive_add_6] Hive 实现 Word Count
    [Hive_add_5] Hive 的 join 操作
    【爬坑】远程连接 MySQL 失败
    [Hive_add_4] Hive 命令行客户端 Beeline 的使用
    [Hive_5] Hive 的 JDBC 编程
    [Hive_add_3] Hive 进行简单数据处理
  • 原文地址:https://www.cnblogs.com/yangmingxianshen/p/9645876.html
Copyright © 2011-2022 走看看