zoukankan      html  css  js  c++  java
  • LSTM缓解梯度消失的原因

    (c_{t}=c_{t-1} otimes sigmaleft(W_{f} cdotleft[H_{t-1}, X_{t} ight] ight) oplus anh left(W_{c} cdotleft[H_{t-1}, X_{t} ight] ight) otimes sigmaleft(W_{i} cdotleft[H_{t-1}, X_{t} ight] ight))

    反向传播公式:

    (egin{aligned} frac{partial E_{k}}{partial W}=& frac{partial E_{k}}{partial H_{k}} frac{partial H_{k}}{partial C_{k}} frac{partial C_{k}}{partial C_{k-1}} ldots frac{partial C_{2}}{partial C_{1}} frac{partial C_{1}}{partial W}=\ & frac{partial E_{k}}{partial H_{k}} frac{partial H_{k}}{partial C_{k}}left(prod_{t=2}^{k} frac{partial C_{t}}{partial C_{t-1}} ight) frac{partial C_{1}}{partial W} end{aligned})

    括号中的部分是累乘项:

    $frac{partial c_{t}}{partial c_{t-1}}=sigmaleft(W_{f} cdotleft[H_{t-1}, X_{t} ight] ight) + ( )frac{d}{d mathcal{C}{t-1}}left( anh left(W{c} cdotleft[H_{t-1}, X_{t} ight] ight) otimes sigmaleft(W_{i} cdotleft[H_{t-1}, X_{t} ight] ight) ight)$

    也就是说,这里的累乘单元是两项和形式,其中前部分是遗忘门的值。遗忘门决定了上一个细胞状态的保留比例,其取值可以接近于1,也就是说可以把遗忘门看成:(sigmaleft(W_{f} cdotleft[H_{t-1}, X_{t} ight] ight) approx overrightarrow{1}),所以LSTM中:

    (frac{partial E_{k}}{partial W} approx frac{partial E_{k}}{partial H_{k}} frac{partial H_{k}}{partial c_{k}}left(Pi_{t=2}^{k} sigmaleft(W_{f} cdotleft[H_{t-1}, X_{t} ight] ight) ight) frac{partial C_{1}}{partial w} rightarrow 0)

    所以,LSTM能缓解梯度消失。

  • 相关阅读:
    MultipartFile(文件的上传)
    JSONObject.fromObject--JSON与对象的转换
    Map集合与转化
    java读取excel文件
    Java中的Arrays类使用详解
    Arrays 类的 binarySearch() 数组查询方法详解
    JDK8 特性详解
    关于Java堆、栈和常量池的详解
    深入java final关键字
    杯酒人生
  • 原文地址:https://www.cnblogs.com/Elaine-DWL/p/11240213.html
Copyright © 2011-2022 走看看