Linear Model
- 假设已有数据x_data = [1.0, 2.0, 3.0],y_data = [2.0, 4.0, 6.0]
- 线性模型为
[hat{y}=x*omega
]
- 损失函数(均方差MSE)为:
[cost(omega)=frac{1}{N}sum_{n=1}^N(hat{y_n}-y_n)^{2}
]
Gradient Descent Algorithm
- 我们给定一个初始的(omega)值,梯度gradient为:
[frac{partial cost}{partial omega}
]
- 整理梯度公式有:
[frac{partial cost}{partial omega}=frac{partial}{partial omega}frac{1}{N}sum_{n=1}^N(x_n*omega-y_n)^2
]
[=frac{1}{N}sum_{n=1}^Nfrac{partial}{partial omega}(x_n*omega-y_n)^2
]
[=frac{1}{N}sum_{n=1}^N 2*x_n(x_n*omega-y_n)
]
至此,求梯度值的函数gradient即可写出。
3. 我们希望梯度越来越小,同时也不想权值(omega)跳得太快,所以会有:
[omega=omega-alpha*gradient
]
来更新(omega),这里的(alpha)是学习率,这是一个人为设定的正数。(过小的学习率会让(omega)迭代更多的次数才能接近最优解,过大的学习率可能会越过最优解并逐渐发散)
4. 本次实验迭代了100次,迭代的大致过程如下
import matplotlib.pyplot as plt
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
w = 1.0
def forward(x):
return x * w
def cost(xs, ys):
result = 0
for x,y in zip(xs, ys):
y_pred = forward(x)
result += (y_pred - y) ** 2
return result / len(xs)
def gradient(xs, ys):
grad = 0
for x,y in zip(xs, ys):
grad += 2 * x * (x * w - y)
return grad / len(xs)
print('Predict (before training)', 4, forward(4))
cost_list = []
epoch_list = []
for epoch in range(100):
cost_val = cost(x_data, y_data)
grad_val = gradient(x_data,y_data)
w -= 0.01*grad_val
print('Epoch:', epoch, 'w=', w, 'loss=', cost_val)
cost_list.append(cost_val)
epoch_list.append(epoch)
plt.plot(epoch_list, cost_list)
plt.xlabel('epoch')
plt.ylabel('cost value')
plt.show()
print('predict (after training)', 4, forward(4))
Stochastic Gradient Descent(SGD)
随机梯度下降法
如果使用梯度下降法,每次⾃自变量量迭代的计算开销为 ,它随着 线性增⻓长。因此,当训练数据样本数很⼤大时,梯度下降每次迭代的计算开销很高。SGD减少了每次迭代的开销,在每次迭代中只随机采一个样本并计算梯度。
梯度下降法 | 随机梯度下降法 | |
---|---|---|
(omega) | (omega=omega-alphafrac{partial cost}{partial omega}) | (omega=omega-alphafrac{partial loss}{partial omega}) |
损失函数导函数 | (frac{partial cost}{partial omega}=frac{1}{N}sum_{n-1}^N2 x_n (x_n omega - y_n)) | (frac{partial loss_n}{partial omega}=2 x_n (x_n omega - y_n)) |
# -*- coding: utf-8 -*-
"""
Created on Wed Aug 26 11:01:09 2020
@author: huxu
"""
import matplotlib.pyplot as plt
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
w = 1.0
def forward(x):
return x*w
def loss(x,y):
y_pred = forward(x)
return (y_pred-y)**2
def gradient(x, y):
return 2*x*(x*w-y)
print('Predict (before training)', 4, forward(4))
loss_list = []
epoch_list = []
for epoch in range(100):
for x,y in zip(x_data, y_data):
grad = gradient(x, y)
w -= 0.01 * grad
print(' grad: ',x,y,grad)
l = loss(x,y)
loss_list.append(l)
epoch_list.append(epoch)
plt.plot(epoch_list, loss_list)
plt.xlabel('epoch')
plt.ylabel('loss value')
plt.show()
print("process: ",epoch,"w= ",w,"loss=",l)
print('predict (after training)', 4, forward(4))
Reference
[1] https://www.bilibili.com/video/BV1Y7411d7Ys?p=3
[2] Dive-into-DL-PyTorch