zoukankan html css js c++ java

NumPy学习笔记

NumPy NumPy官网地址是使用Python进行科学计算的一个基本库。 其中包括：

一个强大的N维数组对象Array；
用于集成C / C ++和Fortran代码的工具；
实用的线性代数、傅里叶变换和随机数生成函数。

除了其明显的科学用途，NumPy也可以用作通用数据的高效多维容器。可以定义任意数据类型。这允许NumPy无缝，快速地与各种各样的数据库集成。

1.函数库的导入

import numpy #或者
import numpy as np

2.基本运算

2.1.求和 .sum()
2.2.求最大值 .max()
2.3.求最小值 .min()
2.4.求平均值 .mean()

import numpy as np
test1 = np.array([[5, 10, 15],
            [20, 25, 30],
            [35, 40, 45]])
test1.sum()
# 输出 225
test1.max()
# 输出 45
test1.min()
# 输出 5
test1.mean()
# 输出 25.0

2.5.矩阵行求和 .sum(axis=1)

test1 = np.array([[5, 10, 15],
            [20, 25, 30],
            [35, 40, 45]])
test1.sum(axis=1)
# 输出 array([30, 75, 120])

2.6.矩阵列求和 .sum(axis=0)

test1 = np.array([[5, 10, 15],
            [20, 25, 30],
            [35, 40, 45]])
test1.sum(axis=0)
# 输出 array([60, 75, 90])

2.7.矩阵乘法

import numpy as np
a = np.array([[1, 2],
              [3, 4]])
b = np.array([[5, 6],
              [7, 8]])
print (a*b) # 对应位置元素相乘
print (a.dot(b)) # 矩阵乘法
print (np.dot(a, b)) # 矩阵乘法，同上
# 输出 [[5 12]
       [21 32]]
      [[19 22]
       [43 50]]
      [[19 22]
       [43 50]]

2.8.元素求平方： a**2

a = np.range(4)
print (a)
print (a**2)
# 输出 [0, 1, 2, 3]
      [0, 1, 4, 9]

2.9.元素求e的n次幂： np.exp(test)
元素开根号： np.sqrt(test)

import numpy as np
test = np.arange(3)
print (test)
print (np.exp(test)) #e的n次幂
print (np.sqrt(test)) #开根号
# 输出 [0 1 2]
    [1. 2.71828183 7.3890561]
    [0 1. 1.41421356]

2.10.向下取整： .floor()

import numpy as np
test = np.floor(10*np.random.random((3, 4)))
print (test)
# 输出 [[ 3.  8.  7.  0.]
     [ 5.  9.  8.  2.]
     [ 3.  0.  9.  0.]]

2.11.平坦化数组： .ravel()

# 二维n行n列转换为一维数组
test.ravel()
# 输出 array([ 3.,  8.,  7.,  0.,  5.,  9.,  8.,  2.,  3.,  0.,  9.,  0.]) 
# 数据接上面，下同

2.12.矩阵转置： .T

test.shape = (6, 2)
print (test)
# 输出 [[ 3.  8.]
 [ 7.  0.]
 [ 5.  9.]
 [ 8.  2.]
 [ 3.  0.]
 [ 9.  0.]]
test.T # test的转置
# 输出 array([[ 3.,  7.,  5.,  8.,  3.,  9.],
       [ 8.,  0.,  9.,  2.,  0.,  0.]])

2.13.矩阵拼接按行： np.vstack((a, b))
矩阵拼接按列： np.hstack((a, b))

import numpy as np
a = np.floor(10*np.random.random((2, 2)))
b = np.floor(10*np.random.random((2, 2)))
print (a)
print ('---')
print (b)
print ('---')
print (np.vstack((a, b))) # 按行拼接，也就是竖方向拼接
print ('---')
print (np.hstack((a, b))) # 按列拼接，也就是横方向拼接
# 输出 [[ 5.  3.]
 [ 8.  0.]]
---
[[ 3.  0.]
 [ 6.  3.]]
---
[[ 5.  3.]
 [ 8.  0.]
 [ 3.  0.]
 [ 6.  3.]]
---
[[ 5.  3.  3.  0.]
 [ 8.  0.  6.  3.]]

2.14.矩阵分割按列： np.hsplit(a, 3) 和 np.hsplit(a, (3, 4))

import numpy as np
a = np.floor(10*np.random.random((2, 12)))
print (a)
# 输出 [[ 6.  7.  5.  7.  9.  1.  2.  3.  1.  9.  5.  7.]
 [ 6.  5.  2.  0.  1.  7.  8.  2.  7.  0.  5.  9.]]
print (np.hsplit(a, 3)) # 按列分割，也就是横方向分割，参数a为要分割的矩阵，参数3为分成三份
print ('---')
print (np.hsplit(a, (3, 4))) # 参数(3, 4)为在维度3前面也就是第4列前切一下，在维度4也就是第5列前面切一下
# 输出 [array([[ 6.,  7.,  5.,  7.],
       [ 6.,  5.,  2.,  0.]]), array([[ 9.,  1.,  2.,  3.],
       [ 1.,  7.,  8.,  2.]]), array([[ 1.,  9.,  5.,  7.],
       [ 7.,  0.,  5.,  9.]])]
---
[array([[ 6.,  7.,  5.],
       [ 6.,  5.,  2.]]), array([[ 7.],
       [ 0.]]), array([[ 9.,  1.,  2.,  3.,  1.,  9.,  5.,  7.],
       [ 1.,  7.,  8.,  2.,  7.,  0.,  5.,  9.]])]

2.15.矩阵分割按行： np.vsplit(a, 3) 和 np.vsplit(a, (3, 4))

import numpy as np
a = np.floor(10*np.random.random((12, 2)))
print (a)
# 输出 [[ 5.  4.]
 [ 8.  7.]
 [ 3.  1.]
 [ 6.  0.]
 [ 4.  4.]
 [ 4.  5.]
 [ 2.  4.]
 [ 7.  3.]
 [ 1.  6.]
 [ 6.  9.]
 [ 2.  1.]
 [ 3.  0.]]
print (np.vsplit(a, 3)) # 按行分割，也就是横竖方向分割，参数a为要分割的矩阵，参数3为分成三份
print ('---')
print (np.vsplit(a, (3, 4))) # 参数(3, 4)为在维度3前面也就是第4行前切一下，在维度4也就是第5行前面切一下
# 输出 [array([[ 5.,  4.],
       [ 8.,  7.],
       [ 3.,  1.],
       [ 6.,  0.]]), array([[ 4.,  4.],
       [ 4.,  5.],
       [ 2.,  4.],
       [ 7.,  3.]]), array([[ 1.,  6.],
       [ 6.,  9.],
       [ 2.,  1.],
       [ 3.,  0.]])]
---
[array([[ 5.,  4.],
       [ 8.,  7.],
       [ 3.,  1.]]), array([[ 6.,  0.]]), array([[ 4.,  4.],
       [ 4.,  5.],
       [ 2.,  4.],
       [ 7.,  3.],
       [ 1.,  6.],
       [ 6.,  9.],
       [ 2.,  1.],
       [ 3.,  0.]])]

2.16.查找并修改矩阵特定元素
例如下面代码中，x_data是我代码中的一个矩阵，但是矩阵数据中有缺失值是用?表示的，我要做一些数据处理，就需要把?换掉，比如换成0

x_data[x_data == '?'] = 0

3.创建数组： .array

首先需要创建数组才能对其进行其它操作，通过给array函数传递Python的序列对象创建数组，如果传递的是多层嵌套的序列，将创建多维数组(如c):

import numpy as np
a = np.array([1, 2, 3, 4])
b = np.array((5, 6, 7, 8))
c = np.array([[1, 2, 3, 4], [4, 5, 6, 7], [7, 8, 9, 10]])
print (a)
print ('---')
print (b)
print ('---')
print (c)
# 输出 [1 2 3 4]
---
[5 6 7 8]
---
[[ 1  2  3  4]
 [ 4  5  6  7]
 [ 7  8  9 10]]

4.查询数据类型： .dtype

# 接上面数据
print (c.dtype)
# 输出 int32

关于数据类型：List中的元素可以是不同的数据类型，而Array和Series中则只允许存储相同的数据类型，这样可以更有效的使用内存，提高运算效率。

4.1.创建时指定元素类型

import numpy as np
a = np.array([[1, 2, 3, 4], [4, 5, 6, 7], [7, 8, 9, 10]])
b = np.array([[1, 2, 3, 4], [4, 5, 6, 7], [7, 8, 9, 10]], dtype='str')
print (a)
print ('---')
print (b)
# 输出 [[ 1  2  3  4]
 [ 4  5  6  7]
 [ 7  8  9 10]]
---
[['1' '2' '3' '4']
 ['4' '5' '6' '7']
 ['7' '8' '9' '10']]

4.2.转换数据类型： .astype

# 接上面数据
b = b.astype(int)
print (b)
# 输出 [[ 1  2  3  4]
 [ 4  5  6  7]
 [ 7  8  9 10]]

4.3. array数组的数据类型

bool -- True , False
int -- int16 , int32 , int64
float -- float16 , float32 , float64
string -- string , unicode

5.查询矩阵的大小： .shape

import numpy as np
a = np.array([1, 2, 3, 4])
b = np.array([[1, 2, 3, 4], [4, 5, 6, 7], [7, 8, 9, 10]])
print (a.shape)
print ('---')
print (b.shape)
# 输出 (4,)
---
(3, 4)

(4, )shape有一个元素即为一维数组，数组中有4个元素
(3, 4)shape有两个元素即为二维数组，数组为3行4列

5.1.通过修改数组的shape属性，在保持数组元素个数不变的情况下，改变数组每个轴的长度。下面的例子将数组b的shape改为(4, 3)，从(3, 4)改为(4, 3)并不是对数组进行转置，而只是改变每个轴的大小，数组元素在内存中的位置并没有改变：

b.shape = 4, 3
print (b)
# 输出 [[ 1  2  3]
 [ 4  4  5]
 [ 6  7  7]
 [ 8  9 10]]

5.2.当某个轴的元素为-1时，将根据数组元素的个数自动计算该轴的长度，下面程序将数组b的shape改为了(2, 6)：

b.shape = 2, -1
print (b)
# 输出 [[ 1  2  3  4  4  5]
 [ 6  7  7  8  9 10]]

5.3.使用数组的reshape方法，可以创建一个改变了尺寸的新数组，原数组的shape保持不变：

a = np.array((1, 2, 3, 4))
b = a.reshape((2, 2))
b
# 输出 array([[1, 2],
       [3, 4]])

6.复制（1）： =

a和b共享数据存储内存区域，因此修改其中任意一个数组的元素都会同时修改另外一个数组或矩阵的内容：

a[2] = 100 # 将数组a的第3个元素改为100，数组d中的2即第三个元素也发生了改变
b
# 输出 array([[1, 2],
       [100, 4]])

import numpy as np
a = np.arange(12)
b = a

print (a)
print (b)
print (b is a) # 判断b是a？
# 输出 [ 0  1  2  3  4  5  6  7  8  9 10 11]
    [ 0  1  2  3  4  5  6  7  8  9 10 11]
    True

b.shape = 3, 4
print (a.shape)
# 输出 (3, 4)

print (id(a))
print (id(b))
# 输出 2239367199840
    2239367199840

7.复制（2）–浅复制： .view()

# The view method creates a new array object that looks at the same data.

import numpy as np
a = np.arange(12)
b = a.view() # b是新创建出来的数组，但是b和a共享数据

b is a # 判断b是a？
# 输出 False
print (b)
# 输出 [ 0  1  2  3  4  5  6  7  8  9 10 11]
b.shape = 2, 6 # 改变b的shape，a的shape不会受影响
print (a.shape)
print (b)
# 输出 (12,)
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]]
b[0, 4] = 1234 # 改变b第1行第5列元素为1234，a对应位置元素受到影响
print (b)
# 输出 [[   0    1    2    3 1234    5]
         [   6    7    8    9   10   11]]
print (a)
# 输出 [   0    1    2    3 1234    5    6    7    8    9   10   11]

8.复制（3）–深复制： .copy()

# The copy method makes a complete copy of the array and its data.

import numpy as np
a = np.arange(12)
a.shape = 3, 4
a[1, 0] = 1234

c = a.copy()
c is a
c[0, 0] = 9999 # 改变c元素的值，不会影响a的元素
print (c)
print (a)
# 输出 [[9999    1    2    3]
 [1234    5    6    7]
 [   8    9   10   11]]
[[   0    1    2    3]
 [1234    5    6    7]
 [   8    9   10   11]]

9.查询维度： .ndim

import numpy as np
a = np.array([[5, 10, 15],
       [20, 25, 30],
       [35, 40, 45]])
a.ndim
# 输出 2

10.查询元素个数： .size

import numpy as np
a = np.array([[5, 10, 15],
       [20, 25, 30],
       [35, 40, 45]])
a.size
# 输出 9

11.创建0矩阵： .zeros

np.zeros((3, 4)) # 创建3行4列的0矩阵
np.zeros((3, 4)， dtype=np.str) # 可以在创建的时候指定数据类型

12.创建1矩阵： .ones

np.noes((3, 4)) # 创建3行4列的1矩阵

13.区间内按等差创建矩阵： .arange

np.arange(10, 30, 5) # 10开始到30，没加5生成一个元素
# 输出array([10, 15, 20, 25])
# 可以通过修改shape属性改变维度，参考上文

np.arange(0, 2, 0.3) # 0开始到2，没加0.3生成一个元素
# 输出array([0, 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])

np.arange(12).reshape(3, 4) # 从0开始每加1共生成12个元素，并通过reshape设定矩阵大小为3行4列
# 输出array([[0, 1, 2, 3],
             [4, 5, 6, 7],
             [8, 9, 10, 11]])

np.random.random((2, 3)) # 生成2行3列矩阵，元素为0-1之间的随机值

14.区间内按元素个数取值： .linspace

from numpy import pi
np.linspace(0, 2*pi, 100) # 0到2*pi，取100个值

15.利用==判断数组或矩阵中是否存在某个值

import numpy as np

mytest = np.array([1,2,3,4])
print mytest == 2
# [False True False False]

查看全文

相关阅读:
Sql 四大排名函数（ROW_NUMBER、RANK、DENSE_RANK、NTILE）（转载）
sql 去掉空格
 sqlserver清除缓存（转载）
sqlsever 科学计数法e 问题
 将一张表中的数据插入到另一张表
 Spark的Driver节点和Executor节点
 实时流计算Spark Streaming原理介绍
 Spark Streaming实战
 Spark 以及 Spark Streaming 核心原理及实践
 SparkSQL读取Hive中的数据

原文地址：https://www.cnblogs.com/WayneZeng/p/7878989.html