(c_{t}=c_{t-1} otimes sigmaleft(W_{f} cdotleft[H_{t-1}, X_{t} ight] ight) oplus anh left(W_{c} cdotleft[H_{t-1}, X_{t} ight] ight) otimes sigmaleft(W_{i} cdotleft[H_{t-1}, X_{t} ight] ight))
反向传播公式:
(egin{aligned} frac{partial E_{k}}{partial W}=& frac{partial E_{k}}{partial H_{k}} frac{partial H_{k}}{partial C_{k}} frac{partial C_{k}}{partial C_{k-1}} ldots frac{partial C_{2}}{partial C_{1}} frac{partial C_{1}}{partial W}=\ & frac{partial E_{k}}{partial H_{k}} frac{partial H_{k}}{partial C_{k}}left(prod_{t=2}^{k} frac{partial C_{t}}{partial C_{t-1}} ight) frac{partial C_{1}}{partial W} end{aligned})
括号中的部分是累乘项:
$frac{partial c_{t}}{partial c_{t-1}}=sigmaleft(W_{f} cdotleft[H_{t-1}, X_{t} ight] ight) + ( )frac{d}{d mathcal{C}{t-1}}left( anh left(W{c} cdotleft[H_{t-1}, X_{t} ight] ight) otimes sigmaleft(W_{i} cdotleft[H_{t-1}, X_{t} ight] ight) ight)$
也就是说,这里的累乘单元是两项和形式,其中前部分是遗忘门的值。遗忘门决定了上一个细胞状态的保留比例,其取值可以接近于1,也就是说可以把遗忘门看成:(sigmaleft(W_{f} cdotleft[H_{t-1}, X_{t} ight] ight) approx overrightarrow{1}),所以LSTM中:
(frac{partial E_{k}}{partial W} approx frac{partial E_{k}}{partial H_{k}} frac{partial H_{k}}{partial c_{k}}left(Pi_{t=2}^{k} sigmaleft(W_{f} cdotleft[H_{t-1}, X_{t} ight] ight) ight) frac{partial C_{1}}{partial w} rightarrow 0)
所以,LSTM能缓解梯度消失。