感知机(转)

zoukankan html css js c++ java

感知机(转)
原文：http://www.hankcs.com/ml/the-perceptron.html

《统计学习方法》系列笔记的第一篇，对应原著第二章。大量引用原著讲解，加入了自己的理解。对书中算法采用Python实现，并用Matplotlib可视化了动画出来，应该算是很硬派了。一套干货下来，很是辛苦，要是能坚持下去就好。

概念

感知机是二分类模型，输入实例的特征向量，输出实例的±类别。

感知机模型

定义

假设输入空间是，输出空间是，x和y分属这两个空间，那么由输入空间到输出空间的如下函数：

称为感知机。其中，w和b称为感知机模型参数，叫做权值或权值向量，叫做偏置，w·x表示向量w和x的内积。sign是一个函数：

感知机的几何解释是，线性方程

将特征空间划分为正负两个部分：

这个平面（2维时退化为直线）称为分离超平面。

感知机学习策略

数据集的线性可分性

定义

给定数据集

其中如果存在某个超平面S

能够完全正确地将正负实例点全部分割开来，则称T线性可分，否则称T线性不可分。

感知机学习策略

假定数据集线性可分，我们希望找到一个合理的损失函数。

一个朴素的想法是采用误分类点的总数，但是这样的损失函数不是参数w，b的连续可导函数，不可导自然不能把握函数的变化，也就不易优化（不知道什么时候该终止训练，或终止的时机不是最优的）。

另一个想法是选择所有误分类点到超平面S的总距离。为此，先定义点x₀到平面S的距离：

分母是w的L₂范数，所谓L₂范数，指的是向量各元素的平方和然后求平方根（长度）。这个式子很好理解，回忆中学学过的点到平面的距离：

此处的点到超平面S的距离的几何意义就是上述距离在多维空间的推广。

又因为，如果点i被误分类，一定有

成立，所以我们去掉了绝对值符号，得到误分类点到超平面S的距离公式：

假设所有误分类点构成集合M，那么所有误分类点到超平面S的总距离为

分母作用不大，反正一定是正的，不考虑分母，就得到了感知机学习的损失函数：

感知机学习算法

原始形式

感知机学习算法是对以下最优化问题的算法：

感知机学习算法是误分类驱动的，先随机选取一个超平面，然后用梯度下降法不断极小化上述损失函数。损失函数的梯度由：

给出。所谓梯度，是一个向量，指向的是标量场增长最快的方向，长度是最大变化率。所谓标量场，指的是空间中任意一个点的属性都可以用一个标量表示的场（个人理解该标量为函数的输出）。

随机选一个误分类点i，对参数w，b进行更新：

上式是学习率。损失函数的参数加上梯度上升的反方向，于是就梯度下降了。所以，上述迭代可以使损失函数不断减小，直到为0。于是得到了原始形式的感知机学习算法：

对于此算法，使用下面的例子作为测试数据：

给出Python实现和可视化代码如下：

感知机算法代码

终于到了最激动人心的时刻了，有了上述知识，就可以完美地可视化这个简单的算法：
1. # -*- coding:utf-8 -*-
2. # Filename: train2.1.py
3. # Author：hankcs
4. # Date: 2015/1/30 16:29
5. import copy
6. from matplotlib import pyplot as plt
7. from matplotlib import animation
9. training_set = [[(3, 3), 1], [(4, 3), 1], [(1, 1), -1]]
10. w = [0, 0]
11. b = 0
12. history = []
15. def update(item):
16. """
17. update parameters using stochastic gradient descent
18. :param item: an item which is classified into wrong class
19. :return: nothing
20. """
21. global w, b, history
22. w[0] += 1 * item[1] * item[0][0]
23. w[1] += 1 * item[1] * item[0][1]
24. b += 1 * item[1]
25. print w, b
26. history.append([copy.copy(w), b])
27. # you can uncomment this line to check the process of stochastic gradient descent
30. def cal(item):
31. """
32. calculate the functional distance between 'item' an the dicision surface. output yi(w*xi+b).
33. :param item:
34. :return:
35. """
36. res = 0
37. for i in range(len(item[0])):
38. res += item[0][i] * w[i]
39. res += b
40. res *= item[1]
41. return res
44. def check():
45. """
46. check if the hyperplane can classify the examples correctly
47. :return: true if it can
48. """
49. flag = False
50. for item in training_set:
51. if cal(item) <= 0:
52. flag = True
53. update(item)
54. # draw a graph to show the process
55. if not flag:
56. print "RESULT: w: " + str(w) + " b: " + str(b)
57. return flag
60. if __name__ == "__main__":
61. for i in range(1000):
62. if not check(): break
64. # first set up the figure, the axis, and the plot element we want to animate
65. fig = plt.figure()
66. ax = plt.axes(xlim=(0, 2), ylim=(-2, 2))
67. line, = ax.plot([], [], 'g', lw=2)
68. label = ax.text([], [], '')
70. # initialization function: plot the background of each frame
71. def init():
72. line.set_data([], [])
73. x, y, x_, y_ = [], [], [], []
74. for p in training_set:
75. if p[1] > 0:
76. x.append(p[0][0])
77. y.append(p[0][1])
78. else:
79. x_.append(p[0][0])
80. y_.append(p[0][1])
82. plt.plot(x, y, 'bo', x_, y_, 'rx')
83. plt.axis([-6, 6, -6, 6])
84. plt.grid(True)
85. plt.xlabel('x')
86. plt.ylabel('y')
87. plt.title('Perceptron Algorithm (www.hankcs.com)')
88. return line, label
91. # animation function. this is called sequentially
92. def animate(i):
93. global history, ax, line, label
95. w = history[i][0]
96. b = history[i][1]
97. if w[1] == 0: return line, label
98. x1 = -7
99. y1 = -(b + w[0] * x1) / w[1]
100. x2 = 7
101. y2 = -(b + w[0] * x2) / w[1]
102. line.set_data([x1, x2], [y1, y2])
103. x1 = 0
104. y1 = -(b + w[0] * x1) / w[1]
105. label.set_text(history[i])
106. label.set_position([x1, y1])
107. return line, label
109. # call the animator. blit=true means only re-draw the parts that have changed.
110. print history
111. anim = animation.FuncAnimation(fig, animate, init_func=init, frames=len(history), interval=1000, repeat=True,
112. blit=True)
113. plt.show()
114. anim.save('perceptron.gif', fps=2, writer='imagemagick')
可视化

可见超平面被误分类点所吸引，朝着它移动，使得两者距离逐步减小，直到正确分类为止。通过这个动画，是不是对感知机的梯度下降算法有了更直观的感悟呢？

算法的收敛性

记输入向量加进常数1的拓充形式，其最大长度为，记感知机的参数向量，设满足条件的超平面可以将数据集完全正确地分类，定义最小值伽马：

则误分类次数k满足：

证明请参考《统计学习方法》P31。

感知机学习算法的对偶形式

对偶指的是，将w和b表示为测试数据i的线性组合形式，通过求解系数得到w和b。具体说来，如果对误分类点i逐步修改wb修改了n次，则w，b关于i的增量分别为，这里，则最终求解到的参数分别表示为：

于是有算法2.2：

感知机对偶算法代码

涉及到比较多的矩阵计算，于是用NumPy比较多：
1. # -*- coding:utf-8 -*-
2. # Filename: train2.2.py
3. # Author：hankcs
4. # Date: 2015/1/31 15:15
5. import numpy as np
6. from matplotlib import pyplot as plt
7. from matplotlib import animation
9. # An example in that book, the training set and parameters' sizes are fixed
10. training_set = np.array([[[3, 3], 1], [[4, 3], 1], [[1, 1], -1]])
12. a = np.zeros(len(training_set), np.float)
13. b = 0.0
14. Gram = None
15. y = np.array(training_set[:, 1])
16. x = np.empty((len(training_set), 2), np.float)
17. for i in range(len(training_set)):
18. x[i] = training_set[i][0]
19. history = []
21. def cal_gram():
22. """
23. calculate the Gram matrix
24. :return:
25. """
26. g = np.empty((len(training_set), len(training_set)), np.int)
27. for i in range(len(training_set)):
28. for j in range(len(training_set)):
29. g[i][j] = np.dot(training_set[i][0], training_set[j][0])
30. return g
33. def update(i):
34. """
35. update parameters using stochastic gradient descent
36. :param i:
37. :return:
38. """
39. global a, b
40. a[i] += 1
41. b = b + y[i]
42. history.append([np.dot(a * y, x), b])
43. # print a, b # you can uncomment this line to check the process of stochastic gradient descent
46. # calculate the judge condition
47. def cal(i):
48. global a, b, x, y
50. res = np.dot(a * y, Gram[i])
51. res = (res + b) * y[i]
52. return res
55. # check if the hyperplane can classify the examples correctly
56. def check():
57. global a, b, x, y
58. flag = False
59. for i in range(len(training_set)):
60. if cal(i) <= 0:
61. flag = True
62. update(i)
63. if not flag:
65. w = np.dot(a * y, x)
66. print "RESULT: w: " + str(w) + " b: " + str(b)
67. return False
68. return True
71. if __name__ == "__main__":
72. Gram = cal_gram() # initialize the Gram matrix
73. for i in range(1000):
74. if not check(): break
76. # draw an animation to show how it works, the data comes from history
77. # first set up the figure, the axis, and the plot element we want to animate
78. fig = plt.figure()
79. ax = plt.axes(xlim=(0, 2), ylim=(-2, 2))
80. line, = ax.plot([], [], 'g', lw=2)
81. label = ax.text([], [], '')
83. # initialization function: plot the background of each frame
84. def init():
85. line.set_data([], [])
86. x, y, x_, y_ = [], [], [], []
87. for p in training_set:
88. if p[1] > 0:
89. x.append(p[0][0])
90. y.append(p[0][1])
91. else:
92. x_.append(p[0][0])
93. y_.append(p[0][1])
95. plt.plot(x, y, 'bo', x_, y_, 'rx')
96. plt.axis([-6, 6, -6, 6])
97. plt.grid(True)
98. plt.xlabel('x')
99. plt.ylabel('y')
100. plt.title('Perceptron Algorithm 2 (www.hankcs.com)')
101. return line, label
104. # animation function. this is called sequentially
105. def animate(i):
106. global history, ax, line, label
108. w = history[i][0]
109. b = history[i][1]
110. if w[1] == 0: return line, label
111. x1 = -7.0
112. y1 = -(b + w[0] * x1) / w[1]
113. x2 = 7.0
114. y2 = -(b + w[0] * x2) / w[1]
115. line.set_data([x1, x2], [y1, y2])
116. x1 = 0.0
117. y1 = -(b + w[0] * x1) / w[1]
118. label.set_text(str(history[i][0]) + ' ' + str(b))
119. label.set_position([x1, y1])
120. return line, label
122. # call the animator. blit=true means only re-draw the parts that have changed.
123. anim = animation.FuncAnimation(fig, animate, init_func=init, frames=len(history), interval=1000, repeat=True,
124. blit=True)
125. plt.show()
126. # anim.save('perceptron2.gif', fps=2, writer='imagemagick')
可视化

与算法1的结果相同，我们也可以将数据集改一下：
1. training_set = np.array([[[3, 3], 1], [[4, 3], 1], [[1, 1], -1], [[5, 2], -1]])
会得到一个复杂一些的结果：

读后感

通过最简单的模型，学习到ML中的常用概念和常见流程。

另外本文只是个人笔记，服务于个人备忘用，对质量和后续不做保证。还是那句话，博客只做补充，要入门，还是得看经典著作。

Reference

文中部分代码参考了OldPanda的实现。
查看全文

相关阅读:
ASP.NET Core – Web API Versioning
ASP.NET Core – Handle Error on Web API
ASP.NET Core – MVC vs Razor Page
EF Core – Library use EF
ASP.NET Core – Filter
GeoServer地图开发解决方案
 参透人生的一道计算题
 iscroll4 精讲
 各种分割线Html代码
 iBatis入门教程

原文地址：https://www.cnblogs.com/wyuzl/p/7640011.html

概念

定义

感知机学习策略

定义

感知机学习算法

原始形式

感知机算法代码

可视化

算法的收敛性

感知机学习算法的对偶形式

感知机对偶算法代码

可视化

读后感

Reference