zoukankan      html  css  js  c++  java
  • Summary of Indexing operation in DataFrame of Pandas

    Summary of Indexing operation in DataFrame of Pandas

    For new users of pandas, the index of DataFrame may seem confusing, so personally I list all its usage in detail and finally make a conclusion about the result of exploration on indexing operation on DataFrame of pandas.

    import pandas as pd
    
    import numpy as np
    
    df=pd.DataFrame(np.arange(16).reshape(4,4),index=['Ohio','Colorado','Utah','New York'],columns=['one','two','three','four']);df
    
    one two three four
    Ohio 0 1 2 3
    Colorado 4 5 6 7
    Utah 8 9 10 11
    New York 12 13 14 15

    (1) df[val]

    • when val is a number,df[val] selects single column from DataFrame,returnning Series type.
    df['one']
    
    Ohio         0
    Colorado     4
    Utah         8
    New York    12
    Name: one, dtype: int32
    
    • when val is a list,df[val] selects sequence columns from DataFrame,returnning DataFrame type.
    df[['one','two']]
    
    one two
    Ohio 0 1
    Colorado 4 5
    Utah 8 9
    New York 12 13
    • when val is :num, df[val] selects rows, and that is for a convenience purpose.That is equivalent to df.iloc[:num],which is specially used to deal with row selection.
    df[:2]
    
    one two three four
    Ohio 0 1 2 3
    Colorado 4 5 6 7
    • df[val],when val is pd.Series whose index is the same with df,value is boolean,returns the index whose value in pd.Series is True.In this case,pd.DataFrame.any or
      pd.DataFrame.all always returns this kind of pd.Series as the input of val in df[val] for the purpose of filtering.
    df.iloc[:2] # the same with above
    
    one two three four
    Ohio 0 1 2 3
    Colorado 4 5 6 7
    df[1:3]
    
    one two three four
    Colorado 4 5 6 7
    Utah 8 9 10 11
    df.iloc[1:3]
    
    one two three four
    Colorado 4 5 6 7
    Utah 8 9 10 11
    • when val is boolean DataFrame, df[val] sets values based on boolean
    df<5
    
    one two three four
    Ohio True True True True
    Colorado True False False False
    Utah False False False False
    New York False False False False
    df[df<5]
    
    one two three four
    Ohio 0.0 1.0 2.0 3.0
    Colorado 4.0 NaN NaN NaN
    Utah NaN NaN NaN NaN
    New York NaN NaN NaN NaN
    df[df<5]=0;df
    
    one two three four
    Ohio 0 0 0 0
    Colorado 0 5 6 7
    Utah 8 9 10 11
    New York 12 13 14 15

    (2)df.loc[val]

    • when val is a single index value,selects corresponding row,returnning Series type, and when val is list of index vale, selects corresponding rows,returnning DataFrame type.
    df.loc['Colorado']
    
    one      0
    two      5
    three    6
    four     7
    Name: Colorado, dtype: int32
    
    df.loc[['Colorado','New York']]
    
    one two three four
    Colorado 0 5 6 7
    New York 12 13 14 15

    (3)df.loc[:,val]

    • when val is a single column value,selects corresponding column,returning Series type and when val is list of columns,select corresponding columns,returnning DataFrame type.
    df.loc[:,'two']
    
    Ohio         0
    Colorado     5
    Utah         9
    New York    13
    Name: two, dtype: int32
    
    df.loc[:,['two']] # Note that ,as long as val is a list even though containing just one element ,it will return DataFrame type.
    
    two
    Ohio 0
    Colorado 5
    Utah 9
    New York 13
    df.loc[:,['one','two']]
    
    one two
    Ohio 0 0
    Colorado 0 5
    Utah 8 9
    New York 12 13
    df[['one','two']] # The same with above df.loc[:,['one','two']]
    
    one two
    Ohio 0 0
    Colorado 0 5
    Utah 8 9
    New York 12 13

    (3)df.loc[val1,val2]

    • when val1 may be a single index value or list of index values,and val2 may be a single column value or list of column values,selects the combination data decided by both val1 and val2.And specially, val1 or val2 can both be : to participate in the combination.
    df.loc['Ohio','one']
    
    0
    
    df.loc[['Ohio','Utah'],'one']
    
    Ohio    0
    Utah    8
    Name: one, dtype: int32
    
    df.loc['Ohio',['one','two']]
    
    one    0
    two    0
    Name: Ohio, dtype: int32
    
    df.loc[['Ohio','Utah'],['one','two']]
    
    one two
    Ohio 0 0
    Utah 8 9
    df.loc[:,:]
    
    one two three four
    Ohio 0 0 0 0
    Colorado 0 5 6 7
    Utah 8 9 10 11
    New York 12 13 14 15
    df.loc['Ohio',:]
    
    one      0
    two      0
    three    0
    four     0
    Name: Ohio, dtype: int32
    
    df.loc[:,'two']
    
    Ohio         0
    Colorado     5
    Utah         9
    New York    13
    Name: two, dtype: int32
    
    df.loc[:,['one','two']]
    
    one two
    Ohio 0 0
    Colorado 0 5
    Utah 8 9
    New York 12 13

    (4) df.iloc[val]

    • Compared with df.loc,val shall be integer or lists of integer which represents the index number and the function is the same with df.loc
    df.iloc[1]
    
    one      0
    two      5
    three    6
    four     7
    Name: Colorado, dtype: int32
    
    df.iloc[[1,3]]
    
    one two three four
    Colorado 0 5 6 7
    New York 12 13 14 15

    (5)df.iloc[:,val]

    • The same with df.loc,except that val shall be integer or list of integers.
    df
    
    one two three four
    Ohio 0 0 0 0
    Colorado 0 5 6 7
    Utah 8 9 10 11
    New York 12 13 14 15
    df.iloc[:,1]
    
    Ohio         0
    Colorado     5
    Utah         9
    New York    13
    Name: two, dtype: int32
    
    df.iloc[:,[1,3]]
    
    two four
    Ohio 0 0
    Colorado 5 7
    Utah 9 11
    New York 13 15

    (6)df.iloc[val1,val2]

    • The same with df.loc,except val1 and val2 shall be integer or list of integers
    df.iloc[1,2]
    
    6
    
    df.iloc[1,[1,2,3]]
    
    two      5
    three    6
    four     7
    Name: Colorado, dtype: int32
    
    df.iloc[[1,2],2]
    
    Colorado     6
    Utah        10
    Name: three, dtype: int32
    
    df.iloc[[1,2],[1,2]]
    
    two three
    Colorado 5 6
    Utah 9 10
    df.iloc[:,[1,2]]
    
    two three
    Ohio 0 0
    Colorado 5 6
    Utah 9 10
    New York 13 14
    df.iloc[[1,2],:]
    
    one two three four
    Colorado 0 5 6 7
    Utah 8 9 10 11

    (7)df.at[val1,val2]

    • val1 shall be a single index value,val2 shall be a single column value.
    df.at['Utah','one']
    
    8
    
    df.loc['Utah','one'] # The same with above
    
    8
    
    df.at[['Utah','Colorado'],'one'] # Raise exception
    
    ---------------------------------------------------------------------------
    
    TypeError                                 Traceback (most recent call last)
    
    D:Anacondalibsite-packagespandascoreframe.py in _get_value(self, index, col, takeable)
       2538         try:
    -> 2539             return engine.get_value(series._values, index)
       2540         except (TypeError, ValueError):
    
    
    pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_value()
    
    
    pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_value()
    
    
    pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()
    
    
    TypeError: '['Utah', 'Colorado']' is an invalid key
    
    
    During handling of the above exception, another exception occurred:
    
    
    TypeError                                 Traceback (most recent call last)
    
    <ipython-input-77-c52a9db91739> in <module>()
    ----> 1 df.at[['Utah','Colorado'],'one']
    
    
    D:Anacondalibsite-packagespandascoreindexing.py in __getitem__(self, key)
       2140 
       2141         key = self._convert_key(key)
    -> 2142         return self.obj._get_value(*key, takeable=self._takeable)
       2143 
       2144     def __setitem__(self, key, value):
    
    
    D:Anacondalibsite-packagespandascoreframe.py in _get_value(self, index, col, takeable)
       2543             # use positional
       2544             col = self.columns.get_loc(col)
    -> 2545             index = self.index.get_loc(index)
       2546             return self._get_value(index, col, takeable=True)
       2547     _get_value.__doc__ = get_value.__doc__
    
    
    D:Anacondalibsite-packagespandascoreindexesase.py in get_loc(self, key, method, tolerance)
       3076                                  'backfill or nearest lookups')
       3077             try:
    -> 3078                 return self._engine.get_loc(key)
       3079             except KeyError:
       3080                 return self._engine.get_loc(self._maybe_cast_indexer(key))
    
    
    pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()
    
    
    pandas\_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()
    
    
    TypeError: '['Utah', 'Colorado']' is an invalid key
    

    (8) df.iat[val1,val2]

    • The same with df.at,except val1 and val2 shall be both integer
    df.iat[2,2]
    
    10
    
    df
    
    one two three four
    Ohio 0 0 0 0
    Colorado 0 5 6 7
    Utah 8 9 10 11
    New York 12 13 14 15

    Conclusion

    • val in df[val] can be a column value or list of column values in this case to selecting the whole column,and specially can also be set :val meaning to select corresponding sliced rows.And also can be boolean DataFrame to set values.
    • Generally speaking, df.loc[val] is mainly used to select rows or the combination of rows and columns,so val has the following forms:single row value,list of row values,val1,val2(val1 and val2 can be single value or list of values or :,and in this form,it selects the combination index value val1 and column value val2
    • df.iloc[val] is the same with df.loc,except val demands integer,whatever single integer value or lists of integers.
    • df.at[val1,val2] shall be only single value and this also applies to df.iat[val1,val2]
    
    
    ##### 愿你一寸一寸地攻城略地,一点一点地焕然一新 #####
  • 相关阅读:
    Spark监控官方文档学习笔记
    Maven打包排除某个资源或者目录
    源码中的哲学——通过构建者模式创建SparkSession
    CentOS7启动Tomcat报错:./startup.sh: Permission denied
    Centos7 安装 redis
    centos7 查找jdk 安装路径
    Activemq(centos7)开机自启动服务
    /var/run/redis_6379.pid exists, process is already running or crashed解决方案
    activeMQ 修改端口号
    CentOS7.5 通过wget下载文件到指定目录
  • 原文地址:https://www.cnblogs.com/johnyang/p/12617102.html
Copyright © 2011-2022 走看看