zoukankan      html  css  js  c++  java
  • stack,unstack,groupby,pivot_table的区别

    stack,unstack,groupby,pivot_table的区别

    stack() 堆积,是花括号的形式,只有列上的索引,unstack() 不要堆积,是表格的形式,行列均有索引 groupby() pivot_table 使用透视表实现groupby的功能

    In [100]:
     
    data=pd.DataFrame(np.arange(6).reshape((2,3)),index=pd.Index(['A','B']),columns=pd.Index(['one','two','three']))count()函数 列表推导式
    #统计列表每个元素中指定单词出现的个数
    words=['apple','pare','banana','and','peach','Anda']
    for word in words:
        print(word.lower().count('a'))  #lower()识别大小写
    1
    1
    3
    1
    1
    2
    [word for word in words if word.lower().count('a')>=2]
    ['banana', 'Anda']
    strings=['a','bv','tit','apple','ctr']
    [x.title() for x in strings if len(x)>2]
    ['Tit', 'Apple', 'Ctr']
    list(map(len,strings))
    [1, 2, 3, 5, 3]
    transpose() 转置
    import numpy as np
    three=np.arange(18).reshape(2,3,3)
    three
    array([[[ 0,  1,  2],
            [ 3,  4,  5],
            [ 6,  7,  8]],
           [[ 9, 10, 11],
            [12, 13, 14],
            [15, 16, 17]]])
    three.transpose(2,1,0)
    array([[[ 0,  9],
            [ 3, 12],
            [ 6, 15]],
           [[ 1, 10],
            [ 4, 13],
            [ 7, 16]],
           [[ 2, 11],
            [ 5, 14],
            [ 8, 17]]])
     
     
    In [101]:
    data
     
     
    Out[101]:
     onetwothree
    A 0 1 2
    B 3 4 5
    In [106]:
     
    dd=data.stack()
    dd
     
     
    Out[106]:
    A  one      0
       two      1
       three    2
    B  one      3
       two      4
       three    5
    dtype: int64
    In [107]:
    dd.unstack()
     
     
    Out[107]:
     onetwothree
    A 0 1 2
    B 3 4 5
    In [112]:
     dd.unstack(level=0) #取最外层索引
     
     
    Out[112]:
     AB
    one 0 3
    two 1 4
    three 2 5
    In [113]:
     
    dd.unstack(level=-1) #取内层索引
     
     
    Out[113]:
     onetwothree
    A 0 1 2
    B 3 4 5
    In [212]:
     
    df = pd.DataFrame({'key1':['a','a','b','b','a'],'key2':['one','two','one','two','one'],'data1':np.random.randn(5),'data2':np.random.randn(5)})
     
    In [115]:
    df
     
     
    Out[115]:
     key1key2data1data2
    0 a one -0.343458 -0.173529
    1 a two -0.753353 0.068864
    2 b one -0.554884 -0.147296
    3 b two -0.064841 1.483495
    4 a one 0.237470 -0.107894
    In [120]:
    grouped1=df.groupby('key1')
     
     
    In [121]:
     
    grouped2=df.groupby(['key1','key2'])
     
     
    In [122]:
    [x for x in grouped1]
     
     
    Out[122]:
    [('a',   key1 key2     data1     data2
      0    a  one -0.343458 -0.173529
      1    a  two -0.753353  0.068864
      4    a  one  0.237470 -0.107894), ('b',   key1 key2     data1     data2
      2    b  one -0.554884 -0.147296
      3    b  two -0.064841  1.483495)]
    In [123]:
    [x for x in grouped2]
     
     
    Out[123]:
    [(('a', 'one'),   key1 key2     data1     data2
      0    a  one -0.343458 -0.173529
      4    a  one  0.237470 -0.107894),
     (('a', 'two'),   key1 key2     data1     data2
      1    a  two -0.753353  0.068864),
     (('b', 'one'),   key1 key2     data1     data2
      2    b  one -0.554884 -0.147296),
     (('b', 'two'),   key1 key2     data1     data2
      3    b  two -0.064841  1.483495)]
    In [124]:
     
    #pandas.pivot_table(data,values=None,index=None,columns=None,aggfunc='mean',fill_value=None,margins=False,
                       #dropna=True,margins_name='All')[source]
    #pivot_table 的默认函数是mean,即求平均值。
     
     
    In [126]:
     
    pd.pivot_table(df,index='key2',columns='key1')
     
     
    Out[126]:
     data1data2
    key1abab
    key2    
    one -0.052994 -0.554884 -0.140712 -0.147296
    two -0.753353 -0.064841 0.068864 1.483495
    In [127]:
     
    pd.pivot_table(df,index=['key1','key2'])
     
     
    Out[127]:
      data1data2
    key1key2  
    aone -0.052994 -0.140712
    two -0.753353 0.068864
    bone -0.554884 -0.147296
    two -0.064841 1.483495
    In [130]:
     df.pivot_table('data1',columns='key2')
     
     
    Out[130]:
    key2onetwo
    data1 -0.220291 -0.409097
  • 相关阅读:
    css中的属性
    css初识和css选择器
    前端html的简单认识
    数据库进阶了解
    数据库索引
    pymysql模块
    数据库的多表查询
    数据库中的行操作
    数据库和表操作以及完整性约束
    数据库概述
  • 原文地址:https://www.cnblogs.com/liyun1/p/11261872.html
Copyright © 2011-2022 走看看