zoukankan      html  css  js  c++  java
  • Deep Learning1:Sparse Autoencoder

    学习stanford的课程http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial 一个月以来,对算法一知半解,Exercise也基本上是复制别人代码,现在想总结一下相关内容

    1. Autoencoders and Sparsity

    稀释编码:Sparsity parameter

    隐藏层的平均激活参数为	extstyle 
ho

    egin{align}
hat
ho_j = frac{1}{m} sum_{i=1}^m left[ a^{(2)}_j(x^{(i)}) 
ight]
end{align}

    约束为

    egin{align}
hat
ho_j = 
ho,
end{align}

    为实现这个目标,在cost Function上额外加上一项惩罚系数

    egin{align}
sum_{j=1}^{s_2} 
ho log frac{
ho}{hat
ho_j} + (1-
ho) log frac{1-
ho}{1-hat
ho_j}.
end{align}

    egin{align}
hat
ho_j = 
ho,
end{align}此项达到最小值

    此时cost Function

    egin{align}
J_{
m sparse}(W,b) = J(W,b) + eta sum_{j=1}^{s_2} {
m KL}(
ho || hat
ho_j),
end{align}

    同时为了方便编程,将隐藏层时的后向传播参数也增加一项

    egin{align}
delta^{(2)}_i =
  left( left( sum_{j=1}^{s_{2}} W^{(2)}_{ji} delta^{(3)}_j 
ight)
+ eta left( - frac{
ho}{hat
ho_i} + frac{1-
ho}{1-hat
ho_i} 
ight) 
ight) f'(z^{(2)}_i) .
end{align}

    为了得到Sparsity parameter,先对所有训练数据进行前向步骤,从而得到激活参数,再次前向步骤,进行反向传播调参,也就是要对所有训练数据进行两次的前向步骤

    2.Backpropagation Algorithm

    在计算过程中,简化了计算步骤
    对于训练集{ (x^{(1)}, y^{(1)}), ldots, (x^{(m)}, y^{(m)}) },cost Function如下

    
egin{align}
J(W,b; x,y) = frac{1}{2} left| h_{W,b}(x) - y 
ight|^2.
end{align}

    仅含方差项

    
egin{align}
J(W,b)
&= left[ frac{1}{m} sum_{i=1}^m J(W,b;x^{(i)},y^{(i)}) 
ight]
                       + frac{lambda}{2} sum_{l=1}^{n_l-1} ; sum_{i=1}^{s_l} ; sum_{j=1}^{s_{l+1}} left( W^{(l)}_{ji} 
ight)^2
 \
&= left[ frac{1}{m} sum_{i=1}^m left( frac{1}{2} left| h_{W,b}(x^{(i)}) - y^{(i)} 
ight|^2 
ight) 
ight]
                       + frac{lambda}{2} sum_{l=1}^{n_l-1} ; sum_{i=1}^{s_l} ; sum_{j=1}^{s_{l+1}} left( W^{(l)}_{ji} 
ight)^2
end{align}

    第一部分是方差,第二部分是规范化项,也称为weight decay项,此公式为overall cost function

    参数W,b的迭代公式如下

    
egin{align}
W_{ij}^{(l)} &= W_{ij}^{(l)} - alpha frac{partial}{partial W_{ij}^{(l)}} J(W,b) \
b_{i}^{(l)} &= b_{i}^{(l)} - alpha frac{partial}{partial b_{i}^{(l)}} J(W,b)
end{align}

    α为学习率

    那么,backpropagation algorithm在参数计算中极大提高了效率

    目的:梯度下降法,迭代多次,得到优化参数

    每次迭代都计算cost function和gradient,再进行下一次迭代

    BP 前向传播后,定义误差项1.输出层是对cost function 对输出结果求导2.中间层 下一层误差项与网络系数相乘,实现逆向推导

    cost function分别对W,b求导如下

    
egin{align}
frac{partial}{partial W_{ij}^{(l)}} J(W,b) &=
left[ frac{1}{m} sum_{i=1}^m frac{partial}{partial W_{ij}^{(l)}} J(W,b; x^{(i)}, y^{(i)}) 
ight] + lambda W_{ij}^{(l)} \
frac{partial}{partial b_{i}^{(l)}} J(W,b) &=
frac{1}{m}sum_{i=1}^m frac{partial}{partial b_{i}^{(l)}} J(W,b; x^{(i)}, y^{(i)})
end{align}

    首先,对训练对象进行前向网络激活运算,得到网络输入值hW,b(x)

    接着,对网络层l 中每一个节点i ,计算误差项delta^{(l)}_i,衡量该节点对于输出的误差所占权重,可用网络激活输出值与真实目标值之差来定义delta^{(n_l)}_inl 是输出层,对于隐藏层,则用a^{(l)}_i作为输出的误差项的权重比来定义delta^{(l)}_i

    算法步骤如下

      • Perform a feedforward pass, computing the activations for layers L2, L3, and so on up to the output layer L_{n_l}.
      • For each output unit i in layer nl (the output layer), set
        
egin{align}
delta^{(n_l)}_i
= frac{partial}{partial z^{(n_l)}_i} ;;
        frac{1}{2} left|y - h_{W,b}(x)
ight|^2 = - (y_i - a^{(n_l)}_i) cdot f'(z^{(n_l)}_i)
end{align}
      • For l = n_l-1, n_l-2, n_l-3, ldots, 2
        For each node i in layer l, set
        
                 delta^{(l)}_i = left( sum_{j=1}^{s_{l+1}} W^{(l)}_{ji} delta^{(l+1)}_j 
ight) f'(z^{(l)}_i)
      • Compute the desired partial derivatives, which are given as:
        
egin{align}
frac{partial}{partial W_{ij}^{(l)}} J(W,b; x, y) &= a^{(l)}_j delta_i^{(l+1)} \
frac{partial}{partial b_{i}^{(l)}} J(W,b; x, y) &= delta_i^{(l+1)}.
end{align}

     对于矩阵,在MATLAB中如下

      • Perform a feedforward pass, computing the activations for layers 	extstyle L_2, 	extstyle L_3, up to the output layer 	extstyle L_{n_l}, using the equations defining the forward propagation steps
      • For the output layer (layer 	extstyle n_l), set
        egin{align}
delta^{(n_l)}
= - (y - a^{(n_l)}) ullet f'(z^{(n_l)})
end{align}
      • For 	extstyle l = n_l-1, n_l-2, n_l-3, ldots, 2
        Set
        egin{align}
                 delta^{(l)} = left((W^{(l)})^T delta^{(l+1)}
ight) ullet f'(z^{(l)})
                 end{align}
      • Compute the desired partial derivatives:
        egin{align}

abla_{W^{(l)}} J(W,b;x,y) &= delta^{(l+1)} (a^{(l)})^T, \

abla_{b^{(l)}} J(W,b;x,y) &= delta^{(l+1)}.
end{align}

     注:	extstyle f(z)为sigmoid函数,则	extstyle f'(z^{(l)}_i) = a^{(l)}_i (1- a^{(l)}_i)

    在此基础上,梯度下降算法gradient descent algorithm步骤如下

      • Set 	extstyle Delta W^{(l)} := 0, 	extstyle Delta b^{(l)} := 0 (matrix/vector of zeros) for all 	extstyle l.
      • For 	extstyle i = 1 to 	extstyle m,
        1. Use backpropagation to compute 	extstyle 
abla_{W^{(l)}} J(W,b;x,y) and 	extstyle 
abla_{b^{(l)}} J(W,b;x,y).
        2. Set 	extstyle Delta W^{(l)} := Delta W^{(l)} + 
abla_{W^{(l)}} J(W,b;x,y).
        3. Set 	extstyle Delta b^{(l)} := Delta b^{(l)} + 
abla_{b^{(l)}} J(W,b;x,y).
      • Update the parameters:
        egin{align}
W^{(l)} &= W^{(l)} - alpha left[ left(frac{1}{m} Delta W^{(l)} 
ight) + lambda W^{(l)}
ight] \
b^{(l)} &= b^{(l)} - alpha left[frac{1}{m} Delta b^{(l)}
ight]
end{align}

    3.Visualizing a Trained Autoencoder

    用pixel intensity values可视化编码器

    已知输出egin{align}
a^{(2)}_i = fleft(sum_{j=1}^{100} W^{(1)}_{ij} x_j  + b^{(1)}_i 
ight).
end{align}

    约束	extstyle ||x||^2 = sum_{i=1}^{100} x_i^2 leq 1

    定义pixel 	extstyle x_j (for all 100 pixels, 	extstyle j=1,ldots, 100)

    egin{align}
x_j = frac{W^{(1)}_{ij}}{sqrt{sum_{j=1}^{100} (W^{(1)}_{ij})^2}}.
end{align}

  • 相关阅读:
    2020牛客暑期多校训练营(第八场)解题报告
    2020牛客暑期多校训练营(第七场)解题报告
    2020 Multi-University Training Contest 3 解题报告
    2020牛客暑期多校训练营(第五场)解题报告
    2020 Multi-University Training Contest 2 解题报告
    Android应用开发基础 学习笔记(一)
    Java基础:JUC包学习笔记
    Java基础:枚举学习笔记
    Java基础:反射学习笔记
    Java基础:Stream流学习笔记
  • 原文地址:https://www.cnblogs.com/learnmuch/p/5956888.html
Copyright © 2011-2022 走看看