zoukankan      html  css  js  c++  java
  • Python 数据分析

    loc,iloc,ix三者间的区别和联系

    loc

    .loc is primarily label based, but may also be used with a boolean array.
    就是说,loc方法主要是用label来选择数据的。[1]

    • A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index. This use is not an integer position along the index)
    • A list or array of labels ['a', 'b', 'c']
    • A slice object with labels 'a':'f', (note that contrary to usual python slices, both the start and the stop are included!)
    • A boolean array

    总的形式还是要保持的df[xx:xx,xx:xx],只不过这里边可以不用切片,但是中间的,还是很关键的。可以不写,,那么,就表示取某一行。但是,不能表示取某一列。

    import pandas as pd
    
    import numpy as np
    
    test=pd.DataFrame(np.random.randn(20).reshape(4,5),index=['A','B','C','D'],columns=['E','F','G','H','I'])
    
    test
    Out[4]: 
              E         F         G         H         I
    A -0.833316 -1.982666  1.055594  0.781759 -0.107631
    B -1.514709 -1.422883  0.204399 -0.487639 -1.652785
    C -0.424735  0.400529 -0.786582  0.855885  0.059894
    D  2.016221 -1.314878 -1.745535 -0.907778  0.834966
    
    test.loc['A']
    Out[5]: 
    E   -0.833316
    F   -1.982666
    G    1.055594
    H    0.781759
    I   -0.107631
    Name: A, dtype: float64
    
    test.loc['E']
    KeyError: 'the label [E] is not in the [index]'
    
    #看见了吧,是“闭区间”
    test.loc['A':'B','E':'F']
    Out[8]: 
              E         F
    A -0.833316 -1.982666
    B -1.514709 -1.422883
    

    label切片选择时,貌似是“闭区间”,:后边的也是包含进去的。

    iloc

    .iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.
    iloc主要就是基于position的选择。注意了,这里的position选择是一种”左闭右开“区间,意思就是df[m:n]只选择m:n-1行的数据。

    • An integer e.g. 5
    • A list or array of integers [4, 3, 0]
    • A slice object with ints 1:7
    • A boolean array
    import pandas as pd
    
    import numpy as np
    
    test=pd.DataFrame(np.random.randn(20).reshape(4,5),index=['A','B','C','D'],columns=['E','F','G','H','I'])
    
    test
    Out[4]: 
              E         F         G         H         I
    A -0.833316 -1.982666  1.055594  0.781759 -0.107631
    B -1.514709 -1.422883  0.204399 -0.487639 -1.652785
    C -0.424735  0.400529 -0.786582  0.855885  0.059894
    D  2.016221 -1.314878 -1.745535 -0.907778  0.834966
    
    #看见了吧,是“左闭右开”区间呀!
    test.iloc[0:1,0:1]
    Out[10]: 
              E
    A -0.833316
    

    ix

    .ix supports mixed integer and label based access. It is primarily label based, but will fall back to integer positional access unless the corresponding axis is of integer type.
    ix就是一种集大成者的选择方法呀!既支持position选择,也支持label选择。主要是label选择。

    import pandas as pd
    
    import numpy as np
    
    test=pd.DataFrame(np.random.randn(20).reshape(4,5),index=['A','B','C','D'],columns=['E','F','G','H','I'])
    
    test
    Out[4]: 
              E         F         G         H         I
    A -0.833316 -1.982666  1.055594  0.781759 -0.107631
    B -1.514709 -1.422883  0.204399 -0.487639 -1.652785
    C -0.424735  0.400529 -0.786582  0.855885  0.059894
    D  2.016221 -1.314878 -1.745535 -0.907778  0.834966
    
    #下面的`ix`是不是和`loc`作用差不多啊~
    test.ix['A':'B','E':'F']
    Out[12]: 
              E         F
    A -0.833316 -1.982666
    B -1.514709 -1.422883
    
    #下面的是和`iloc`差不多了
    test.ix[0:1,0:1]
    Out[11]: 
              E
    A -0.833316
    

    但是需要注意的是,当index或者columns是整数时,ix索引其实是按label选择的,因此,是闭区间的

    参考

    发现还是官方文档说的最详细啊!希望以后有机会多看看这里的内容~


    1. 官方文档-Indexing and Selecting Data ↩︎

  • 相关阅读:
    记通过身份证号计算是否成年
    mysql出现which is not functionally dependent on columns in GROUP BY clause报错
    Git操作之 git add 撤销、git commit 撤销
    laraval ueditor 上传图片失败
    jquery调用百度api连接实现网页实时刷新汇率
    Laravel框架使用maatwebsite/excel 导出表格样式自定义
    数字千分位格式化
    laravel 5.4 解决使用Excel::load()导入的excel文件中日期格式变为数字
    layui select可以使用jQuery的change事件
    Linux系统通过命令修改BT宝塔面板的默认8888端口
  • 原文地址:https://www.cnblogs.com/michael-xiang/p/10466866.html
Copyright © 2011-2022 走看看