机器学习(1)之梯度下降(gradient descent)
题记:最近零碎的时间都在学习Andrew Ng的machine learning,因此就有了这些笔记。
梯度下降是线性回归的一种(Linear Regression),首先给出一个关于房屋的经典例子,
面积(feet2) | 房间个数 | 价格(1000$) |
2104 | 3 | 400 |
1600 | 3 | 330 |
2400 | 3 | 369 |
1416 | 2 | 232 |
3000 | 4 | 540 |
... | ... | .. |
一. 批梯度下降算法
上式就是批梯度下降算法(batch gradient descent),当上式收敛时则退出迭代,何为收敛,即前后两次迭代的值不再发生变化了。一般情况下,会设置一个具体的参数,当前后两次迭代差值小于该参数时候结束迭代。注意以下几点:
二. 随机梯度下降算法
因为每次计算梯度都需要遍历所有的样本点。这是因为梯度是J(θ)的导数,而J(θ)是需要考虑所有样本的误差和 ,这个方法问题就是,扩展性问题,当样本点很大的时候,基本就没法算了。所以接下来又提出了随机梯度下降算法(stochastic gradient descent )。随机梯度下降算法,每次迭代只是考虑让该样本点的J(θ)趋向最小,而不管其他的样本点,这样算法会很快,但是收敛的过程会比较曲折,整体效果上,大多数时候它只能接近局部最优解,而无法真正达到局部最优解。所以适合用于较大训练集的case。
1 # coding=utf-8 2 #!/usr/bin/python 3 4 ''' 5 Created on 2014年9月6日 6 7 @author: Ryan C. F. 8 9 ''' 10 11 #Training data set 12 #each element in x represents (x0,x1,x2) 13 x = [(1,0.,3) , (1,1.,3) ,(1,2.,3), (1,3.,2) , (1,4.,4)] 14 #y[i] is the output of y = theta0 * x[0] + theta1 * x[1] +theta2 * x[2] 15 y = [95.364,97.217205,75.195834,60.105519,49.342380] 16 17 18 epsilon = 0.0001 19 #learning rate 20 alpha = 0.01 21 diff = [0,0] 22 error1 = 0 23 error0 =0 24 m = len(x) 25 26 27 #init the parameters to zero 28 theta0 = 0 29 theta1 = 0 30 theta2 = 0 31 32 while True: 33 34 #calculate the parameters 35 for i in range(m): 36 37 diff[0] = y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] ) 38 39 theta0 = theta0 + alpha * diff[0]* x[i][0] 40 theta1 = theta1 + alpha * diff[0]* x[i][1] 41 theta2 = theta2 + alpha * diff[0]* x[i][2] 42 43 #calculate the cost function 44 error1 = 0 45 for lp in range(len(x)): 46 error1 += ( y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] ) )**2/2 47 48 if abs(error1-error0) < epsilon: 49 break 50 else: 51 error0 = error1 52 53 print ' theta0 : %f, theta1 : %f, theta2 : %f, error1 : %f'%(theta0,theta1,theta2,error1) 54 55 print 'Done: theta0 : %f, theta1 : %f, theta2 : %f'%(theta0,theta1,theta2)
1 # coding=utf-8 2 #!/usr/bin/python 3 4 ''' 5 Created on 2014年9月6日 6 7 @author: Ryan C. F. 8 9 ''' 10 11 #Training data set 12 #each element in x represents (x0,x1,x2) 13 x = [(1,0.,3) , (1,1.,3) ,(1,2.,3), (1,3.,2) , (1,4.,4)] 14 #y[i] is the output of y = theta0 * x[0] + theta1 * x[1] +theta2 * x[2] 15 y = [95.364,97.217205,75.195834,60.105519,49.342380] 16 17 18 epsilon = 0.000001 19 #learning rate 20 alpha = 0.001 21 diff = [0,0] 22 error1 = 0 23 error0 =0 24 m = len(x) 25 26 #init the parameters to zero 27 theta0 = 0 28 theta1 = 0 29 theta2 = 0 30 sum0 = 0 31 sum1 = 0 32 sum2 = 0 33 while True: 34 35 #calculate the parameters 36 for i in range(m): 37 #begin batch gradient descent 38 diff[0] = y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] ) 39 sum0 = sum0 + alpha * diff[0]* x[i][0] 40 sum1 = sum1 + alpha * diff[0]* x[i][1] 41 sum2 = sum2 + alpha * diff[0]* x[i][2] 42 #end batch gradient descent 43 theta0 = sum0; 44 theta1 = sum1; 45 theta2 = sum2; 46 #calculate the cost function 47 error1 = 0 48 for lp in range(len(x)): 49 error1 += ( y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] ) )**2/2 50 51 if abs(error1-error0) < epsilon: 52 break 53 else: 54 error0 = error1 55 56 print ' theta0 : %f, theta1 : %f, theta2 : %f, error1 : %f'%(theta0,theta1,theta2,error1) 57 58 print 'Done: theta0 : %f, theta1 : %f, theta2 : %f'%(theta0,theta1,theta2)
1. 随机梯度下降算法在迭代的时候,每迭代一个新的样本,就会更新一次所有的theta参数。
35 for i in range(m): 36 37 diff[0] = y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] ) 38 39 theta0 = theta0 + alpha * diff[0]* x[i][0] 40 theta1 = theta1 + alpha * diff[0]* x[i][1] 41 theta2 = theta2 + alpha * diff[0]* x[i][2]
2. 批梯度下降算法在迭代的时候,是完成所有样本的迭代后才会去更新一次theta参数
35 #calculate the parameters 36 for i in range(m): 37 #begin batch gradient descent 38 diff[0] = y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] ) 39 sum0 = sum0 + alpha * diff[0]* x[i][0] 40 sum1 = sum1 + alpha * diff[0]* x[i][1] 41 sum2 = sum2 + alpha * diff[0]* x[i][2] 42 #end batch gradient descent 43 theta0 = sum0; 44 theta1 = sum1; 45 theta2 = sum2;