zoukankan      html  css  js  c++  java
  • python数据分析之pandas库的DataFrame应用二

      本节介绍Series和DataFrame中的数据的基本手段

    1. 重新索引

      pandas对象的一个重要方法就是reindex,作用是创建一个适应新索引的新对象

    '''
    Created on 2016-8-10
    @author: xuzhengzhu
    '''
    '''
    Created on 2016-8-10
    @author: xuzhengzhu
    '''
    from pandas import  *
    
    print "--------------obj result:-----------------"
    obj=Series([4.5,7.2,-5.3,3.6],index=['d','b','a','c'])
    print obj
    
    print "--------------obj2 result:-----------------"
    obj2=obj.reindex(['a','b','c','d','e'])
    print obj2
    
    print "--------------obj3 result:-----------------"
    obj3=obj.reindex(['a','b','c','d','e'],fill_value=0)
    print obj3
    reindex

     #reindex对索引值进行重排,如果当前索引值不存在,就引入缺失值
     #可以指定fill_value=0来进行缺失值的替换

    --------------obj result:-----------------
    d    4.5
    b    7.2
    a   -5.3
    c    3.6
    dtype: float64
    --------------obj2 result:-----------------
    a   -5.3
    b    7.2
    c    3.6
    d    4.5
    e    NaN
    dtype: float64
    --------------obj3 result:-----------------
    a   -5.3
    b    7.2
    c    3.6
    d    4.5
    e    0.0
    dtype: float64
    reindex_index

      2.插值

      对于时间序列这样的有序数据,重新索引时可能需要做一些插值处理,method选项即可达到此目的:

    对于时间序列这样的有序数据,重新索引时可能需要做一些插值处理,method选项即可达到此目的:

    method参数介绍
    参数 说明
    ffill或pad 前向填充
    bfill或backfill 后向填充

     

     

     

     

     

    '''
    Created on 2016-8-10
    @author: xuzhengzhu
    '''
    from pandas import  *
    
    print "--------------obj3 result:-----------------"
    obj3=Series(['blue','red','yellow'],index=[0,2,4])
    print obj3
    
    print "--------------obj4 result:-----------------"
    obj4=obj3.reindex(range(6),method='ffill')
    
    print obj4
    ffill前向填充
    --------------obj3 result:-----------------
    0      blue
    2       red
    4    yellow
    dtype: object
    --------------obj4 result:-----------------
    0      blue
    1      blue
    2       red
    3       red
    4    yellow
    5    yellow
    dtype: object
    ffill结果:

      对于DataFrame数据类型,reindex可以修改行与列索引,但如果仅传入一个序列,则优先重新索引行:

    '''
    Created on 2016-8-10
    @author: xuzhengzhu
    '''
    from pandas import  *
    
    print "--------------frame result:-----------------"
    frame=DataFrame(np.arange(9).reshape((3,3)),index=['a','c','d'],columns=['ohio','texas','california'])
    print frame
    
    print "--------------frame2 result:-----------------"
    frame2=frame.reindex(['a','b','c','d'])
    print frame2
    
    print "--------------frame3 result:-----------------"
    frame3=frame.reindex(columns=['texas','utah','california'])
    print frame3
    
    print "--------------frame3 result:-----------------"
    frame4=frame.ix[['a','b','c','d'],['texas','utah','california']]
    print frame4
    reindex_dataframe
    --------------frame result:-----------------
       ohio  texas  california
    a     0      1           2
    c     3      4           5
    d     6      7           8
    --------------frame2 result:-----------------
       ohio  texas  california
    a   0.0    1.0         2.0
    b   NaN    NaN         NaN
    c   3.0    4.0         5.0
    d   6.0    7.0         8.0
    --------------frame3 result:-----------------
       texas  utah  california
    a      1   NaN           2
    c      4   NaN           5
    d      7   NaN           8
    --------------frame3 result:-----------------
       texas  utah  california
    a    1.0   NaN         2.0
    b    NaN   NaN         NaN
    c    4.0   NaN         5.0
    d    7.0   NaN         8.0
    reindex结果:

      3.指定轴上的项

    '''
    Created on 2016-8-10
    @author: xuzhengzhu
    '''
    from pandas import  *
    
    print "--------------Series drop item by index:-----------------"
    obj=Series(np.arange(3,8),index=['a','b','c','d','e'])
    print obj
    
    
    
    obj1=obj.drop('c')
    print obj1
    
    print "--------------DataFrame drop item by index :-----------------"
    frame=DataFrame(np.arange(9).reshape((3,3)),index=['a','c','d'],columns=['ohio','texas','california'])
    print frame
    
    frame1=frame.drop(['ohio'],axis=1)
    print frame1
    指定轴上的项
    --------------Series drop item by index:-----------------
    a    3
    b    4
    c    5
    d    6
    e    7
    dtype: int32
    a    3
    b    4
    d    6
    e    7
    dtype: int32
    --------------DataFrame drop item by index :-----------------
       ohio  texas  california
    a     0      1           2
    c     3      4           5
    d     6      7           8
       texas  california
    a      1           2
    c      4           5
    d      7           8
    drop_item

    #对于DataFrame,可以删除任意轴上的索引值
     

      4.索引,选取和过滤

      Series利用标签的切片运算与普通的python切片运算不同,其末端是包含的,

      DataFrame进行索引就是获取一个或多个列

    '''
    Created on 2016-8-10
    @author: xuzhengzhu
    '''
    from pandas import  *
    
    print "--------------DataFrame drop item by index :-----------------"
    frame=DataFrame(np.arange(9).reshape((3,3)),index=['a','c','d'],columns=['ohio','texas','california'])
    print frame
    
    frame1=frame.drop(['ohio'],axis=1)
    print frame1
    
    print "--------------DataFrame filter item by index :-----------------"
    #也可通过切片和布尔型来选取
    print frame['ohio']
    print frame[:2]
    print frame[frame['ohio']>=3]
    
    print "--------------DataFrame filter item by index :-----------------"
    #在DateFrame上进行标签索引,引入ix: 注意行标签在前,列标签在后
    print frame.ix['a',['ohio','texas']]
    索引选取和过滤
    --------------DataFrame drop item by index :-----------------
       ohio  texas  california
    a     0      1           2
    c     3      4           5
    d     6      7           8
       texas  california
    a      1           2
    c      4           5
    d      7           8
    --------------DataFrame filter item by index :-----------------
    a    0
    c    3
    d    6
    Name: ohio, dtype: int32
       ohio  texas  california
    a     0      1           2
    c     3      4           5
       ohio  texas  california
    c     3      4           5
    d     6      7           8
    --------------DataFrame filter item by index :-----------------
    ohio     0
    texas    1
    Name: a, dtype: int32
    结果:

     

      5.算术运算和数据对齐

    '''
    Created on 2016-8-10
    @author: xuzhengzhu
    '''
    from pandas import  *
    
    print "--------------DataFrame drop item by index :-----------------"
    s1=Series([7.3,-2.5,3.4,1.5],index=['a','c','d','e'])
    s2=Series([-2.1,3.6,-1.5,4,3.1],index=['a','c','e','f','g'])
    print s1+s2
    算术运算和数据对齐
    --------------DataFrame drop item by index :-----------------
    a    5.2
    c    1.1
    d    NaN
    e    0.0
    f    NaN
    g    NaN
    dtype: float64
    结果:
    '''
    Created on 2016-8-10
    @author: xuzhengzhu
    '''
    from pandas import  *
    
    print "--------------DataFrame drop item by index :-----------------"
    df1=DataFrame(np.arange(9).reshape((3,3)),columns=list('bcd'),index=['ohio','texas','colorado'])
    df2=DataFrame(np.arange(12).reshape((4,3)),columns=list('bde'),index=['utah','ohio','texas','oregon'])
    
    print df1
    print "--------------------"
    
    print df2
    
    #只返回行列均匹配的数值
    print "-------df1+df2-------------"
    print df1+df2
    
    #在对不同的索引对象进行算术运算时,当一个对象中某个轴标签在另一个对象中找不到时填充一个特殊值
    print "-------df3-------------"
    df3=df1.add(df2,fill_value=0)
    print df3
    对齐操作
    --------------DataFrame drop item by index :-----------------
              b  c  d
    ohio      0  1  2
    texas     3  4  5
    colorado  6  7  8
    --------------------
            b   d   e
    utah    0   1   2
    ohio    3   4   5
    texas   6   7   8
    oregon  9  10  11
    -------df1+df2-------------
                b   c     d   e
    colorado  NaN NaN   NaN NaN
    ohio      3.0 NaN   6.0 NaN
    oregon    NaN NaN   NaN NaN
    texas     9.0 NaN  12.0 NaN
    utah      NaN NaN   NaN NaN
    -------df3-------------
                b    c     d     e
    colorado  6.0  7.0   8.0   NaN
    ohio      3.0  1.0   6.0   5.0
    oregon    9.0  NaN  10.0  11.0
    texas     9.0  4.0  12.0   8.0
    utah      0.0  NaN   1.0   2.0
    结果:
  • 相关阅读:
    面试题12:打印1到最大的n位数
    java生成指定范围的随机数
    排序
    Java中的String类和算法例子替换空格
    动态规划、贪心算法笔记
    牛客编程巅峰赛S1第2场
    UVA 489
    UVA 1339
    UVA 1587
    UVA 202
  • 原文地址:https://www.cnblogs.com/HondaHsu/p/5760183.html
Copyright © 2011-2022 走看看