zoukankan      html  css  js  c++  java
  • pandas层次化索引

    一 前言

    本篇的层次化索引是一篇读者必须要会的知识,特别对数据的分类起到很好的效果,知识追寻者文章的数据构造一向都很随意,所以体现不出什么直观感受,有心的读者可以构造有层级的数据(比如部门的层级,学科的分数层级等等)进行学习本篇文章肯定感觉大有收获,师傅领进门,修行看个人;

    公众号:知识追寻者

    知识追寻者(Inheriting the spirit of open source, Spreading technology knowledge;)

    二层级化索引

    2.1 层级化索引

    将原始的索引1至6分为3个层级,分别是 a,b,c,如下示例

    # -*- coding: utf-8 -*-
    
    import pandas as pd
    import numpy as np
    
    index=[['a', 'a', 'b', 'b', 'c', 'c'], [1, 2, 3, 4, 5, 6]]
    ser = pd.Series(np.random.randn(6),index)
    print(ser)
    

    输出

    a  1   -0.286724
       2   -0.619187
    b  3    0.480865
       4   -0.597817
    c  5   -0.165860
       6    2.628038
    

    2.1 获取指定层级数据

    获取指定层级数据,比如b级数据;

    # -*- coding: utf-8 -*-
    
    import pandas as pd
    import numpy as np
    
    index=[['a', 'a', 'b', 'b', 'c', 'c'], [1, 2, 3, 4, 5, 6]]
    ser = pd.Series(np.random.randn(6),index)
    level_b = ser['b']
    print(level_b)
    

    输出

    3    0.208537
    4   -0.903878
    dtype: float64
    

    2.2 获取指定值

    # -*- coding: utf-8 -*-
    
    import pandas as pd
    import numpy as np
    
    index=[['a', 'a', 'b', 'b', 'c', 'c'], [1, 2, 3, 4, 5, 6]]
    ser = pd.Series(np.random.randn(6),index)
    level_b1 = ser['b',3]
    print(level_b1)
    

    输出

    -2.278494077763927
    

    2.3 层级切片

    也可以类似字符串,列表一样进行对索引进行切片获取;比如想获取b和c两个层级的数据;

    # -*- coding: utf-8 -*-
    
    import pandas as pd
    import numpy as np
    
    index=[['a', 'a', 'b', 'b', 'c', 'c'], [1, 2, 3, 4, 5, 6]]
    ser = pd.Series(np.random.randn(6),index)
    level_bc = ser['b':'c']
    print(level_bc)
    

    输出

    b  3   -0.111179
       4   -1.018673
    c  5    0.922177
       6   -1.040579
    dtype: float64
    

    当然也可以使用loc进行切片,将会出现2层索引;

    # -*- coding: utf-8 -*-
    
    import pandas as pd
    import numpy as np
    
    index=[['a', 'a', 'b', 'b', 'c', 'c'], [1, 2, 3, 4, 5, 6]]
    ser = pd.Series(np.random.randn(6),index)
    level_ab = ser.loc[['a','b']]
    print(level_ab)
    

    输出

    a  1   -0.272074
       2   -0.708729
    b  3    1.277346
       4    1.080583
    dtype: float64
    

    2.4 多层级中应用unstack

    之前文章提到过stack , unstack 的应用,这次使用unstack应用于多层级,实现内层级的列转为行

    # -*- coding: utf-8 -*-
    
    import pandas as pd
    import numpy as np
    
    index=[['a', 'a', 'b', 'b', 'c', 'c'], [1, 2, 3, 4, 5, 6]]
    ser = pd.Series(np.random.randn(6),index)
    unser = ser.unstack()
    print(unser)
    

    输出

              1         2         3         4         5         6
    a  0.452994  1.397289       NaN       NaN       NaN       NaN
    b       NaN       NaN  2.400214 -0.130237       NaN       NaN
    c       NaN       NaN       NaN       NaN  1.329461  1.041663
    

    三 多轴对应多索引

    如果想列有2行,索引有2行,就实现了一个数据集可以使用不同的索引列的功能,好强大;

    3.1 多轴示例

    索引a ,b;和 1,2,3,4 ;列 zszxz1,zszxz2; 和 u1,u2;

    # -*- coding: utf-8 -*-
    
    import pandas as pd
    import numpy as np
    
    index=[['a', 'a', 'b', 'b'], [1, 2, 3, 4]]
    columns = [['zszxz1','zszxz2'],['u1', 'u2']]
    frame = pd.DataFrame(np.random.randn(8).reshape((4,2)), index=index,columns=columns)
    print(frame)
    

    输出

           zszxz1    zszxz2
               u1        u2
    a 1 -1.239692 -0.395482
      2 -0.587833 -0.225688
    b 3  1.504247  0.523000
      4 -0.996312 -0.540993
    

    3.2 获取单层索引值

    使用loc获取单层索引

    # -*- coding: utf-8 -*-
    
    import pandas as pd
    import numpy as np
    
    index=[['a', 'a', 'b', 'b'], [1, 2, 3, 4]]
    columns = [['zszxz1','zszxz2'],['u1', 'u2']]
    frame = pd.DataFrame(np.random.randn(8).reshape((4,2)), index=index,columns=columns)
    print(frame)
    
    print(frame.loc['a'])
    

    输出

         zszxz1    zszxz2
             u1        u2
    1 -0.539454 -0.018574
    2 -1.180073 -1.261010
    

    3.2 获取双层索引值

    # -*- coding: utf-8 -*-
    
    import pandas as pd
    import numpy as np
    
    index=[['a', 'a', 'b', 'b'], [1, 2, 3, 4]]
    columns = [['zszxz1','zszxz2'],['u1', 'u2']]
    frame = pd.DataFrame(np.random.randn(8).reshape((4,2)), index=index,columns=columns)
    print(frame.loc[['a']])
    

    输出

           zszxz1    zszxz2
               u1        u2
    a 1 -0.539454 -0.018574
      2 -1.180073 -1.261010
    

    3.3 根据外层列获取

    # -*- coding: utf-8 -*-
    
    import pandas as pd
    import numpy as np
    
    index=[['a', 'a', 'b', 'b'], [1, 2, 3, 4]]
    columns = [['zszxz1','zszxz2'],['u1', 'u2']]
    frame = pd.DataFrame(np.random.randn(8).reshape((4,2)), index=index,columns=columns)
    print(frame['zszxz1'])
    

    输出

               u1
    a 1 -2.062139
      2  0.624969
    b 3  1.050788
      4  0.088685
    

    3.4 根据内层列获取

    # -*- coding: utf-8 -*-
    
    import pandas as pd
    import numpy as np
    
    index=[['a', 'a', 'b', 'b'], [1, 2, 3, 4]]
    columns = [['zszxz1','zszxz2'],['u1', 'u2']]
    frame = pd.DataFrame(np.random.randn(8).reshape((4,2)), index=index,columns=columns)
    print(frame['zszxz1']['u1'])
    

    输出

    a  1    0.104911
       2    0.219530
    b  3    0.816740
       4    0.793440
    Name: u1, dtype: float64
    

    3.4 根据索引列获取指定值

    想要获取第一行第一列的值

    # -*- coding: utf-8 -*-
    
    import pandas as pd
    import numpy as np
    
    index=[['a', 'a', 'b', 'b'], [1, 2, 3, 4]]
    columns = [['zszxz1','zszxz2'],['u1', 'u2']]
    frame = pd.DataFrame(np.random.randn(8).reshape((4,2)), index=index,columns=columns)
    print(frame.loc['a',1]['zszxz1','u1'])
    

    输出

    2.2670422041028484
    

    3.5 取值总结

    • 对于列可以使用中括号[]进行逐级获取,一个中括号[]就是一个层级;
    • 想获取一个层级里面的多个内容就是 [column1,column2....];

    • 对于行的获取就是使用 loc 函数,在一个[] 中出现多值表示多个层级 [level1,levle2];
    • 出现多个[],根据不同的数据结构对于不同的行列;

    四多层级构造方式说明

    除了原有的显示构造函数进行多层级构造支持如下构造方式

    pd.pd.MultiIndex.from_product()

    pd.pd.MultiIndex.from_tuples()

    pd.MultiIndex.from_arrays()

    pd.MultiIndex.from_frame()

    index=[['a', 'a', 'b', 'b'], [1, 2, 3, 4]]
    columns = [['zszxz1','zszxz2'],['u1', 'u2']]
    frame = pd.DataFrame(np.random.randn(8).reshape((4,2)), index=pd.MultiIndex.from_arrays(index),columns=columns)
    print(frame)
    

    输出

           zszxz1    zszxz2
               u1        u2
    a 1  0.423330 -1.065528
      2 -0.231434 -0.763397
    b 3 -0.185660 -0.713429
      4 -0.134907  1.489376
    
  • 相关阅读:
    [LeetCode] 1640. Check Array Formation Through Concatenation
    [LeetCode] 754. Reach a Number
    [LeetCode] 1457. Pseudo-Palindromic Paths in a Binary Tree
    [LeetCode] 1352. Product of the Last K Numbers
    [LeetCode] 261. Graph Valid Tree
    [LeetCode] 323. Number of Connected Components in an Undirected Graph
    [LeetCode] 1605. Find Valid Matrix Given Row and Column Sums
    [LeetCode] 1253. Reconstruct a 2-Row Binary Matrix
    [LeetCode] 455. Assign Cookies
    [LeetCode] 1358. Number of Substrings Containing All Three Characters
  • 原文地址:https://www.cnblogs.com/zszxz/p/12843033.html
Copyright © 2011-2022 走看看