要求
用Python中numpy库手写一个单层神经网络(严格来说是用逻辑回归做二分类问题),用GD(Gradient Descent)作为optimizer。
Gradient Descent
见西瓜书
Stochastic Gradient Descent
相当于每次只随机挑出一条数据forward然后优化,其他和Gradient Descent基本一样
代码要点
熟悉numpy相关的函数接口
np.random.randn() #产生符合高斯分布的随机数
np.random.randint() #产生随机整数
np.around() #四舍五入,相当于round()
np.dot() #向量乘法,点乘用*
np.sum() #求和,可以用于向量求和
训练流程
1. data preparation
2. forward data to model
3. compute loss #其实自己写的时候不算loss也是可以的,只要对其求导就可以了
4. optimize #使用各种优化算法,更新参数
5. predict #测试
(如果有分batch的话还有这一步需要准备)
import numpy as np # generate x in [100, 4], which follows the Gaussian distribution # y = round(sigmoid(x*w+b)) w = [0.3, 0.6, -0.7, -1.3] b = 0.2 # Given x, y, use GD(SGD); get [w,b] # two class logistic regression def sigmoid(x, derivative = False): sigm = 1 / (1 + np.exp(-x)) if derivative: return sigm * (1 - sigm) return sigm def generate_data(x): w = np.array([0.3, 0.6, -0.7, -1.3]) b = np.array(0.2) y = np.around(sigmoid(np.dot(x,w)+b)) return y def forward(x, w, b): return sigmoid(np.dot(x,w)+b) def compute_loss(y_, y): m = y_.shape[0] assert m == y.shape[0] return -1/m * np.sum(y * np.log(y_) + (1 - y) * np.log(1 - y_)) def GD_optimizer(w, b, x, y, y_, lr=0.01): # backpropagate m = y.shape[0] dw = - 1/m * np.dot((y - y_), x) db = - 1/m * np.sum(y- y_) # renew parameter w = w - lr * dw b = b - lr * db return w, b def predict(w, b): x_test = np.random.randn(200,4) y_test = generate_data(x_test) y_ = np.around(forward(x_test, w, b)) acc = np.sum(y_test == y_) / 200 print("acc:", acc) if __name__ == "__main__": # initialization x = np.random.randn(1000,4) w = np.random.randn(4) b = np.zeros((1)) y = generate_data(x) iter_num = 5001 #train for i in range(iter_num): # forward y_ = forward(x, w, b) # compute loss loss = compute_loss(y_, y) # renew parameter w, b = GD_optimizer(w, b, x, y, y_) # print info if i % 100 == 0: print("loss after iteration {}: {}".format(i, loss) ) predict(w, b) print("w:", w) print("b:", b) predict(w, b)
其他变种
改为softmax多分类
将损失函数改为对softmax交叉熵的loss,optimize中求导的地方也要稍微改改
改为SGD(随机梯度下降)optimizer
每个epoch都先用np.random.shuffle()打乱数据,然后再每次forward一条数据进行optimize,相当于loss不求平均,而是只有一条数据的loss
改为多层神经网络
参数变多,求导的地方也会变得复杂一些
softmax分类损失函数求导:
1. https://blog.csdn.net/wangyangzhizhou/article/details/75088106
2. https://blog.csdn.net/qian99/article/details/78046329
逻辑回归损失函数求导:(和softmax基本差不多)
https://www.cnblogs.com/zhongmiaozhimen/p/6155093.html