zoukankan      html  css  js  c++  java
  • pandas 的算术运算和数据对齐

    pandas 还有一个重要的功能,就是他可以对不同索引的对象进行算数运算。
    对象相加, 如果存在不同的索引对,则结果的索引就是该索引对的并集

    先来个例子

    Series

    In [33]: s1 = Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e'])
    
    In [34]: s2 = Series([-2.1, 3.6, -1.5, 4, 3.1], index=['a', 'c', 'e', 'f', 'g'])
    
    In [35]: s1
    Out[35]:
    a    7.3
    c   -2.5
    d    3.4
    e    1.5
    dtype: float64
    
    In [36]: s2
    Out[36]:
    a   -2.1
    c    3.6
    e   -1.5
    f    4.0
    g    3.1
    dtype: float64
    
    In [37]: s1 + s2
    Out[37]:
    a    5.2
    c    1.1
    d    NaN
    e    0.0
    f    NaN
    g    NaN
    dtype: float64
    生成值
    In [38]: s3 = Series([-2.1, 3.6, -1.5, 4, 3.1], index=['a', 'c', 'e', 'f', 'g'])
    
    In [39]: s1 + s2 + s3
    Out[39]:
    a    3.1
    c    4.7
    d    NaN
    e   -1.5
    f    NaN
    g    NaN
    dtype: float64
    也就是说NaN值不会变

    DataFrame

    add   用于加法(+)方法
    sub   用于减法(-)方法
    div   用于除法(/)方法
    mul   用于乘法(*)方法
    In [45]: df1 = DataFrame(np.arange(9.).reshape((3,3)), columns=list('bcd'), index=['Ohio', "Texas", "Colorado"])
    
    In [46]: df2 = DataFrame(np.arange(12.).reshape((4,3)), columns=list('bde'), index=["Uhah", 'Ohio', "Texas", "Oregon"])
    

      

    In [47]: df1 + df2
    Out[47]:
                b   c     d   e
    Colorado  NaN NaN   NaN NaN
    Ohio      3.0 NaN   6.0 NaN
    Oregon    NaN NaN   NaN NaN
    Texas     9.0 NaN  12.0 NaN
    Uhah      NaN NaN   NaN NaN
    
    那么可以使用add方法,传入df2一个fill_valued参数
    In [8]: df1.add(df2, fill_value=0)
    Out[8]:
                b    c     d     e
    Colorado  6.0  7.0   8.0   NaN
    Ohio      3.0  1.0   6.0   5.0
    Oregon    9.0  NaN  10.0  11.0
    Texas     9.0  4.0  12.0   8.0
    Uhah      0.0  NaN   1.0   2.0

    DataFrame和Series之间的运算

    Series

    In [40]: arr = np.arange(12.).reshape((3, 4))
    
    In [41]: arr
    Out[41]:
    array([[  0.,   1.,   2.,   3.],
           [  4.,   5.,   6.,   7.],
           [  8.,   9.,  10.,  11.]])
    
    In [42]: arr[0]
    Out[42]: array([ 0.,  1.,  2.,  3.])
    
    In [43]: arr - arr[0]
    Out[43]:
    array([[ 0.,  0.,  0.,  0.],
           [ 4.,  4.,  4.,  4.],
           [ 8.,  8.,  8.,  8.]])

    DataFrame

    In [44]: frame = DataFrame(np.arange(12.).reshape((4,3)), columns=list('bde'), index=["Uhah", 'Ohio', "Texas", "Oregon"])
    
    In [45]: series = frame.ix[0]
    
    In [46]: frame - series
    Out[46]:
              b    d    e
    Uhah    0.0  0.0  0.0
    Ohio    3.0  3.0  3.0
    Texas   6.0  6.0  6.0
    Oregon  9.0  9.0  9.0

    注意:如果某个索引值在DataFrame的列或Series的索引中找不到, 则参与运算的两个对象就会被重新索引以形成并集

    In [47]: series2 = Series(range(3), index=['b', 'e', 'f'])
    
    In [48]: frame + series2
    Out[48]:
              b   d     e   f
    Uhah    0.0 NaN   3.0 NaN
    Ohio    3.0 NaN   6.0 NaN
    Texas   6.0 NaN   9.0 NaN
    Oregon  9.0 NaN  12.0 NaN

    如果希望列在行上广播,必须使用算术运算方法

    In [63]: frame.sub(series, axis=0)
    Out[63]: 
              b    d    e
    Uhah   -1.0  0.0  1.0
    Ohio   -1.0  0.0  1.0
    Texas  -1.0  0.0  1.0
    Oregon -1.0  0.0  1.0
  • 相关阅读:
    【2014广州市选day1】JZOJ2020年9月12日提高B组T2 导弹拦截
    JZOJ2020年9月12日提高B组反思
    部署zookeeper
    13安装heapster
    11 安装traefik
    10 安装coredns
    9 安装flannel
    8 部署kube-proxy
    7 部署kubelete
    6 部署 controller-manager scheduler
  • 原文地址:https://www.cnblogs.com/renfanzi/p/6249652.html
Copyright © 2011-2022 走看看