深度学习简介
深度学习是一种运用深度神经网络的机器学习技术,深度学习的创新在于许多微小技术的改进。具备更深层次的神经网络导致性能降低的原因在于其网络未能被有效地训练。在深度神经网络的训练过程中,反向传播算法面临这三个主要问题:梯度消失、过拟合、计算负载。
(1)梯度消失
在采用反向传播算法进行训练时,梯度消失发生在输出误差可能无法到达更远的节点。解决该问题的典型方法是使用修正线性单元(Rectified Linear Unit,ReLU)作为激活函数。
(2)过拟合
深度神经网络尤其容易过拟合的原因在于它包含了更多的隐含层以及权重值,致使其模型变得更复杂。最具代表性的解决方法是Dropout,即针对一些随机选定的节点而不是整个网络进行训练。Dropout的合适比例约为50%以及25%。另一种是在代价函数中增加正则化项。
(3)计算负载
使用GPU以及批量归一化等方法解决。
1.ReLU实例
输入数据为5个5*5矩阵,分别为1,2,3,4,5。网络结构为25个输入节点,3个隐含层,每个隐含层20个节点,5个输出节点。代码如下:
function [W1, W2, W3, W4] = DeepReLU(W1, W2, W3, W4, X, D)
alpha = 0.01;
N = 5;
for k = 1:N
x = reshape(X(:, :, k), 25, 1);
v1 = W1*x;
y1 = ReLU(v1);
v2 = W2*y1;
y2 = ReLU(v2);
v3 = W3*y2;
y3 = ReLU(v3);
v = W4*y3;
y = Softmax(v);
d = D(k, :)';
e = d - y;
delta = e;
e3 = W4'*delta;
delta3 = (v3 > 0).*e3;
e2 = W3'*delta3;
delta2 = (v2 > 0).*e2;
e1 = W2'*delta2;
delta1 = (v1 > 0).*e1;
dW4 = alpha*delta*y3';
W4 = W4 + dW4;
dW3 = alpha*delta3*y2';
W3 = W3 + dW3;
dW2 = alpha*delta2*y1';
W2 = W2 + dW2;
dW1 = alpha*delta1*x';
W1 = W1 + dW1;
end
end
ReLU定义如下:
function y = ReLU(x)
y = max(0, x);
end
测试代码如下:
clear all
X = zeros(5, 5, 5);
X(:, :, 1) = [ 0 1 1 0 0;
0 0 1 0 0;
0 0 1 0 0;
0 0 1 0 0;
0 1 1 1 0
];
X(:, :, 2) = [ 1 1 1 1 0;
0 0 0 0 1;
0 1 1 1 0;
1 0 0 0 0;
1 1 1 1 1
];
X(:, :, 3) = [ 1 1 1 1 0;
0 0 0 0 1;
0 1 1 1 0;
0 0 0 0 1;
1 1 1 1 0
];
X(:, :, 4) = [ 0 0 0 1 0;
0 0 1 1 0;
0 1 0 1 0;
1 1 1 1 1;
0 0 0 1 0
];
X(:, :, 5) = [ 1 1 1 1 1;
1 0 0 0 0;
1 1 1 1 0;
0 0 0 0 1;
1 1 1 1 0
];
D = [ 1 0 0 0 0;
0 1 0 0 0;
0 0 1 0 0;
0 0 0 1 0;
0 0 0 0 1
];
W1 = 2*rand(20, 25) - 1;
W2 = 2*rand(20, 20) - 1;
W3 = 2*rand(20, 20) - 1;
W4 = 2*rand( 5, 20) - 1;
for epoch = 1:10000 % train
[W1, W2, W3, W4] = DeepReLU(W1, W2, W3, W4, X, D);
end
N = 5; % inference
for k = 1:N
x = reshape(X(:, :, k), 25, 1);
v1 = W1*x;
y1 = ReLU(v1);
v2 = W2*y1;
y2 = ReLU(v2);
v3 = W3*y2;
y3 = ReLU(v3);
v = W4*y3;
y = Softmax(v)
end
该代码偶尔无法完成训练并产生错误的结果,ReLU函数对初始权重更敏感。
2.Dropout实例
函数定义如下:
function [W1, W2, W3, W4] = DeepDropout(W1, W2, W3, W4, X, D)
alpha = 0.01;
N = 5;
for k = 1:N
x = reshape(X(:, :, k), 25, 1);
v1 = W1*x;
y1 = Sigmoid(v1);
y1 = y1 .* Dropout(y1, 0.2); % 丢弃第一个隐含层20%的几点
v2 = W2*y1;
y2 = Sigmoid(v2);
y2 = y2 .* Dropout(y2, 0.2);
v3 = W3*y2;
y3 = Sigmoid(v3);
y3 = y3 .* Dropout(y3, 0.2);
v = W4*y3;
y = Softmax(v);
d = D(k, :)';
e = d - y;
delta = e;
e3 = W4'*delta;
delta3 = y3.*(1-y3).*e3;
e2 = W3'*delta3;
delta2 = y2.*(1-y2).*e2;
e1 = W2'*delta2;
delta1 = y1.*(1-y1).*e1;
dW4 = alpha*delta*y3';
W4 = W4 + dW4;
dW3 = alpha*delta3*y2';
W3 = W3 + dW3;
dW2 = alpha*delta2*y1';
W2 = W2 + dW2;
dW1 = alpha*delta1*x';
W1 = W1 + dW1;
end
end
Dropout定义如下:
function ym = Dropout(y, ratio)
% y是输出向量;ratio是输出向量Dropout的比例
[m, n] = size(y);
ym = zeros(m, n);
num = round(m*n*(1-ratio));
idx = randperm(m*n, num); % ym元素的索引
ym(idx) = 1 / (1-ratio);
end
Sigmoid函数定义为:
function y = Sigmoid(x)
y = 1 ./ (1 + exp(-x));
end
测试代码如下:
clear all
X = zeros(5, 5, 5);
X(:, :, 1) = [ 0 1 1 0 0;
0 0 1 0 0;
0 0 1 0 0;
0 0 1 0 0;
0 1 1 1 0
];
X(:, :, 2) = [ 1 1 1 1 0;
0 0 0 0 1;
0 1 1 1 0;
1 0 0 0 0;
1 1 1 1 1
];
X(:, :, 3) = [ 1 1 1 1 0;
0 0 0 0 1;
0 1 1 1 0;
0 0 0 0 1;
1 1 1 1 0
];
X(:, :, 4) = [ 0 0 0 1 0;
0 0 1 1 0;
0 1 0 1 0;
1 1 1 1 1;
0 0 0 1 0
];
X(:, :, 5) = [ 1 1 1 1 1;
1 0 0 0 0;
1 1 1 1 0;
0 0 0 0 1;
1 1 1 1 0
];
D = [ 1 0 0 0 0;
0 1 0 0 0;
0 0 1 0 0;
0 0 0 1 0;
0 0 0 0 1
];
W1 = 2*rand(20, 25) - 1;
W2 = 2*rand(20, 20) - 1;
W3 = 2*rand(20, 20) - 1;
W4 = 2*rand( 5, 20) - 1;
for epoch = 1:20000 % train
[W1, W2, W3, W4] = DeepDropout(W1, W2, W3, W4, X, D);
end
N = 5; % inference
for k = 1:N
x = reshape(X(:, :, k), 25, 1);
v1 = W1*x;
y1 = Sigmoid(v1);
v2 = W2*y1;
y2 = Sigmoid(v2);
v3 = W3*y2;
y3 = Sigmoid(v3);
v = W4*y3;
y = Softmax(v)
end
上述过程中Softmax函数为:
function y = Softmax(x)
ex = exp(x);
y = ex / sum(ex);
end
最终输出了正确的分类结果。