zoukankan      html  css  js  c++  java
  • Numpy基础学习笔记2

    3.数组处理数据

    Numpy数组可以代替循环,进行矢量化的运算,通常会比纯python的方式快一两个数量级。

    3.1 将条件逻辑表述为数组运算

    np.where函数是x if condition else y的矢量化版本。

    In [15]: yarr = np.array([2.1,2.2,2.3,2.4,2.5])
    
    In [16]: cond = np.array([True,False,True,True,False])
    
    In [17]: xarr = np.array([1.1,1.2,1.3,1.4,1.5])
    
    In [18]: np.where(cond,xarr,yarr)  # 判断cond条件,真zarr,假yarr
    Out[18]: array([ 1.1,  2.2,  1.3,  1.4,  2.5])
    
    

    另一个例子,希望将一组随机数,正数替换为2,负数替换为-2

    In [19]: arr = np.random.randn(4,4)
    
    In [20]: arr
    Out[20]:
    array([[ 1.18242592,  0.34138367,  0.36648288,  0.87214939],
           [ 0.67129526,  0.2410077 ,  0.37928273, -0.43982009],
           [ 0.47559093, -0.050917  , -0.10229582,  1.58122926],
           [ 0.83486166, -1.27310522,  0.17164926,  0.77951888]])
    
    In [21]: np.where(arr > 0,2,-2)
    Out[21]:
    array([[ 2,  2,  2,  2],
           [ 2,  2,  2, -2],
           [ 2, -2, -2,  2],
           [ 2, -2,  2,  2]])
    
    In [22]: np.where(arr > 0,2,arr)  # 负数还是arr
    Out[22]:
    array([[ 2.        ,  2.        ,  2.        ,  2.        ],
          [ 2.        ,  2.        ,  2.        , -0.43982009],
          [ 2.        , -0.050917  , -0.10229582,  2.        ],
          [ 2.        , -1.27310522,  2.        ,  2.        ]])
    

    3.2 数学和统计方法

    这些方法一般可以作为实例方法调用,也可以当做Numpy函数使用。

    In [23]: arr = np.random.randn(5,4)
    
    In [24]: arr.mean()
    Out[24]: -0.024836906150552153
    
    In [25]: np.mean(arr)
    Out[25]: -0.024836906150552153
    

    基本数组统计方法如下:

    In [26]: arr
    Out[26]:
    array([[-0.03065448,  0.91344557, -0.77812406, -1.608862  ],
           [ 1.58463814,  0.98126805,  1.06389757, -1.17451329],
           [ 1.48408281,  0.02386196, -0.80217916,  0.29413806],
           [ 0.11536984,  1.73736452,  0.93596778,  0.26898712],
           [-2.05527855,  0.49837502, -2.56571303, -1.38280997]])
    
    In [27]: arr.sum()
    Out[27]: -0.49673812301104303
    
    In [28]: arr.sum(axis=0)
    Out[28]: array([ 1.09815775,  4.15431511, -2.14615091, -3.60306008])
    
    In [29]: arr.sum(axis=1)
    Out[29]: array([-1.50419497,  2.45529046,  0.99990367,  3.05768925, -5.50542653]
    )
    
    In [30]: arr.mean()
    Out[30]: -0.024836906150552153
    
    In [31]: arr.mean(axis=1)
    Out[31]: array([-0.37604874,  0.61382262,  0.24997592,  0.76442231, -1.37635663]
    )
    
    In [32]: arr.std()
    Out[32]: 1.2223549632355621
    
    In [33]: arr.var()
    Out[33]: 1.4941516561466126
    
    In [34]: arr.min()
    Out[34]: -2.565713031578829
    
    In [35]: arr.max()
    Out[35]: 1.7373645152425918
    
    In [36]: arr.argmin()
    Out[36]: 18
    
    In [37]: arr.cumsum()
    Out[37]:
    array([-0.03065448,  0.88279109,  0.10466703, -1.50419497,  0.08044316,
            1.06171121,  2.12560878,  0.95109549,  2.4351783 ,  2.45904026,
            1.6568611 ,  1.95099916,  2.066369  ,  3.80373352,  4.73970129,
            5.00868841,  2.95340986,  3.45178488,  0.88607184, -0.49673812])
    
    In [38]: arr.cumprod()
    Out[38]:
    array([ -3.06544789e-02,  -2.80011979e-02,   2.17884059e-02,
            -3.50545383e-02,  -5.55487582e-02,  -5.45082216e-02,
            -5.79911645e-02,   6.81113935e-02,   1.01082948e-01,
             2.41203713e-03,  -1.93488591e-03,  -5.69123592e-04,
            -6.56596961e-05,  -1.14074826e-04,  -1.06770361e-04,
            -2.87198518e-05,   5.90272954e-05,   2.94177294e-05,
            -7.54774516e-05,   1.04370972e-04])
    
    

    3.3 用于布尔型数组的方法

    布尔值是True和False,同时也是1和0。我们可以使用sum来统计True值得计数。

    In [39]: arr
    Out[39]:
    array([[-0.03065448,  0.91344557, -0.77812406, -1.608862  ],
           [ 1.58463814,  0.98126805,  1.06389757, -1.17451329],
           [ 1.48408281,  0.02386196, -0.80217916,  0.29413806],
           [ 0.11536984,  1.73736452,  0.93596778,  0.26898712],
           [-2.05527855,  0.49837502, -2.56571303, -1.38280997]])
    
    In [40]: (arr>0).sum()
    Out[40]: 12
    
    In [41]: arr>0
    Out[41]:
    array([[False,  True, False, False],
           [ True,  True,  True, False],
           [ True,  True, False,  True],
           [ True,  True,  True,  True],
           [False,  True, False, False]], dtype=bool)
    

    还有ang和all两个方法,可以用于布尔型数组,也可以用于非布尔型。在用于非布尔型数组时,所有非0元素都被当做True。

    In [46]: bools = arr > 0   #将arr>0这个bool型数组赋值
    
    In [47]: bools
    Out[47]:
    array([[False,  True, False, False],
           [ True,  True,  True, False],
           [ True,  True, False,  True],
           [ True,  True,  True,  True],
           [False,  True, False, False]], dtype=bool)
    
    In [48]: bools.any()
    Out[48]: True
    
    In [49]: bools.all()
    Out[49]: False
    
    In [50]: arr.any()  #非0值将当成True处理。
    Out[50]: True
    

    3.4 排序

    Numpy数组可以通过sort方法就地排序。

    In [51]: arr
    Out[51]:
    array([[-0.03065448,  0.91344557, -0.77812406, -1.608862  ],
           [ 1.58463814,  0.98126805,  1.06389757, -1.17451329],
           [ 1.48408281,  0.02386196, -0.80217916,  0.29413806],
           [ 0.11536984,  1.73736452,  0.93596778,  0.26898712],
           [-2.05527855,  0.49837502, -2.56571303, -1.38280997]])
    
    In [52]: arr.sort()
    
    In [53]: arr
    Out[53]:
    array([[-1.608862  , -0.77812406, -0.03065448,  0.91344557],
           [-1.17451329,  0.98126805,  1.06389757,  1.58463814],
           [-0.80217916,  0.02386196,  0.29413806,  1.48408281],
           [ 0.11536984,  0.26898712,  0.93596778,  1.73736452],
           [-2.56571303, -2.05527855, -1.38280997,  0.49837502]])
    
    In [54]: arr.sort(axis=0)
    
    In [55]: arr
    Out[55]:
    array([[-2.56571303, -2.05527855, -1.38280997,  0.49837502],
           [-1.608862  , -0.77812406, -0.03065448,  0.91344557],
           [-1.17451329,  0.02386196,  0.29413806,  1.48408281],
           [-0.80217916,  0.26898712,  0.93596778,  1.58463814],
           [ 0.11536984,  0.98126805,  1.06389757,  1.73736452]])
    
    In [56]: arr.sort(1)
    
    In [57]: arr
    Out[57]:
    array([[-2.56571303, -2.05527855, -1.38280997,  0.49837502],
           [-1.608862  , -0.77812406, -0.03065448,  0.91344557],
           [-1.17451329,  0.02386196,  0.29413806,  1.48408281],
           [-0.80217916,  0.26898712,  0.93596778,  1.58463814],
           [ 0.11536984,  0.98126805,  1.06389757,  1.73736452]])
    

    举个例子,求一个数组百分之5的分位数。

    In [62]: arr = np.random.randn(1000)
    
    In [63]: arr.sort()
    
    In [64]: arr[int(0.05 * len(arr))]
    Out[64]: -1.6307748333138019
    
    In [67]: arr[50]
    Out[67]: -1.6307748333138019
    

    3.5 唯一化(去重)以及数组的集合运算

    np.unique方法为数组去重,并排序。

    In [68]: names = np.array(["Bob","Joe","Will","Bob","Will","Joe","Joe"])
    
    In [69]: np.unique(names)
    Out[69]:
    array(['Bob', 'Joe', 'Will'],
          dtype='<U4')
    # 该方法类似于纯python中的如下:
    In [70]: sorted(set(names))
    Out[70]: ['Bob', 'Joe', 'Will']
    

    其他集合运算:

    In [71]: x = np.arange(1,101)
    
    In [72]: y = np.arange(51,151)
    
    In [73]: x
    Out[73]:
    array([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
            14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,
            27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,
            40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,
            53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,  65,
            66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,
            79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,  91,
            92,  93,  94,  95,  96,  97,  98,  99, 100])
    
    In [74]: y
    Out[74]:
    array([ 51,  52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,
            64,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,
            77,  78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,
            90,  91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102,
           103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115,
           116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128,
           129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141,
           142, 143, 144, 145, 146, 147, 148, 149, 150])
    
    In [75]: np.intersect1d(x,y)
    Out[75]:
    array([ 51,  52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,
            64,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,
            77,  78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,
            90,  91,  92,  93,  94,  95,  96,  97,  98,  99, 100])
    
    In [77]: np.union1d(x,y)
    Out[77]:
    array([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
            14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,
            27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,
            40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,
            53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,  65,
            66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,
            79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,  91,
            92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103, 104,
           105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117,
           118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130,
           131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143,
           144, 145, 146, 147, 148, 149, 150])
    
    In [78]: np.in1d(x,y)
    Out[78]:
    array([False, False, False, False, False, False, False, False, False,
           False, False, False, False, False, False, False, False, False,
           False, False, False, False, False, False, False, False, False,
           False, False, False, False, False, False, False, False, False,
           False, False, False, False, False, False, False, False, False,
           False, False, False, False, False,  True,  True,  True,  True,
            True,  True,  True,  True,  True,  True,  True,  True,  True,
            True,  True,  True,  True,  True,  True,  True,  True,  True,
            True,  True,  True,  True,  True,  True,  True,  True,  True,
            True,  True,  True,  True,  True,  True,  True,  True,  True,
            True,  True,  True,  True,  True,  True,  True,  True,  True,  True], dt
    ype=bool)
    
    In [79]: np.setdiff1d(x,y)
    Out[79]:
    array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
           18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
           35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50])
    
    In [80]: np.setxor1d(x,y)
    Out[80]:
    array([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
            14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,
            27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,
            40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50, 101, 102,
           103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115,
           116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128,
           129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141,
           142, 143, 144, 145, 146, 147, 148, 149, 150])
    

    4.文件处理

    Numpy可以读写文本数据或二进制数据。后续有pandas来处理文本,因此本部分简单介绍。

    4.1 以二进制方式保存和读取numpy数组

    单个数组,保存时会自动添加后缀名.npy

    In [86]: arr = np.arange(10)
    
    In [88]: np.save("some_array", arr)
    
    In [90]: np.load("some_array.npy")
    Out[90]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    

    多个数组,可以使用压缩方式存储,后缀名.npz

    In [91]: arr
    Out[91]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    
    In [92]: arr2 = np.arange(20)
    
    In [93]: np.savez("array_archive.npz",a=arr,b=arr2)
    
    In [94]: arch = np.load("array_archive.npz")
    
    In [95]: arch
    Out[95]: <numpy.lib.npyio.NpzFile at 0x7084f98>
    
    In [96]: arch['b']
    Out[96]:
    array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
           17, 18, 19])
    

    4.2 存取文本文件

    使用np.savetxtnp.loadtxt两个方法来实现。后面会主要介绍pandas中的read_csv和read_table函数,这里不详细介绍。

    In [99]: arr  = np.random.randn(5,5)
    
    In [102]: np.savetxt("arr.txt",arr,delimiter=",")
    
    In [103]: np.loadtxt("arr.txt",delimiter=",")
    Out[103]:
    array([[ 0.45439906, -0.11067033,  1.67561654,  0.14142381,  0.1016269 ],
           [-1.09070259,  0.41627682, -0.81896911, -0.14980666, -1.06391152],
           [-0.88333647,  0.28268258,  0.69605952,  0.36348569, -0.53223699],
           [-0.50561387, -0.65916355,  1.40181374,  1.17810701,  1.31155551],
           [ 0.060254  , -1.02915195, -0.59382843,  0.49100178, -0.9541697 ]])
    
    In [104]: arr
    Out[104]:
    array([[ 0.45439906, -0.11067033,  1.67561654,  0.14142381,  0.1016269 ],
           [-1.09070259,  0.41627682, -0.81896911, -0.14980666, -1.06391152],
           [-0.88333647,  0.28268258,  0.69605952,  0.36348569, -0.53223699],
           [-0.50561387, -0.65916355,  1.40181374,  1.17810701,  1.31155551],
           [ 0.060254  , -1.02915195, -0.59382843,  0.49100178, -0.9541697 ]])
    

    待续。。。

  • 相关阅读:
    jq中的ajax
    浅谈ajax的优点与缺点
    jq模拟操作
    Spring注解使用和与配置文件的关系
    Spring中@Autowired注解、@Resource注解的区别
    分页技术
    动态的把固定格式的json数据以菜单形式插入
    web.xml文件中context-param、listener、filter、servlet的执行顺序
    spring MVC controller中的方法跳转到另外controller中的某个method的方法
    spring mvc后台如何处理ajax的请求,并返回json
  • 原文地址:https://www.cnblogs.com/felo/p/6357542.html
Copyright © 2011-2022 走看看