循环神经网络进阶
GRU
RNN存在的问题:梯度较容易出现衰减或爆炸(BPTT)
⻔控循环神经⽹络:捕捉时间序列中时间步距离较⼤的依赖关系
RNN:

[H_{t} = ϕ(X_{t}W_{xh} + H_{t-1}W_{hh} + b_{h})
]
GRU:

[R_{t} = σ(X_tW_{xr} + H_{t−1}W_{hr} + b_r)\
Z_{t} = σ(X_tW_{xz} + H_{t−1}W_{hz} + b_z)\
widetilde{H}_t = tanh(X_tW_{xh} + (R_t ⊙H_{t−1})W_{hh} + b_h)\
H_t = Z_t⊙H_{t−1} + (1−Z_t)⊙widetilde{H}_t
]
- 重置⻔有助于捕捉时间序列⾥短期的依赖关系;
- 更新⻔有助于捕捉时间序列⾥⻓期的依赖关系。
LSTM
长短期记忆long short-term memory:
遗忘门:控制上一时间步的记忆细胞
输入门:控制当前时间步的输入
输出门:控制从记忆细胞到隐藏状态
记忆细胞:⼀种特殊的隐藏状态的信息的流动

[I_t = σ(X_tW_{xi} + H_{t−1}W_{hi} + b_i) \
F_t = σ(X_tW_{xf} + H_{t−1}W_{hf} + b_f)\
O_t = σ(X_tW_{xo} + H_{t−1}W_{ho} + b_o)\
widetilde{C}_t = tanh(X_tW_{xc} + H_{t−1}W_{hc} + b_c)\
C_t = F_t ⊙C_{t−1} + I_t ⊙widetilde{C}_t\
H_t = O_t⊙tanh(C_t)
]
深度循环神经网络

[oldsymbol{H}_t^{(1)} = phi(oldsymbol{X}_t oldsymbol{W}_{xh}^{(1)} + oldsymbol{H}_{t-1}^{(1)} oldsymbol{W}_{hh}^{(1)} + oldsymbol{b}_h^{(1)})\
oldsymbol{H}_t^{(ell)} = phi(oldsymbol{H}_t^{(ell-1)} oldsymbol{W}_{xh}^{(ell)} + oldsymbol{H}_{t-1}^{(ell)} oldsymbol{W}_{hh}^{(ell)} + oldsymbol{b}_h^{(ell)})\
oldsymbol{O}_t = oldsymbol{H}_t^{(L)} oldsymbol{W}_{hq} + oldsymbol{b}_q
]
双向循环神经网络

[egin{aligned} overrightarrow{oldsymbol{H}}_t &= phi(oldsymbol{X}_t oldsymbol{W}_{xh}^{(f)} + overrightarrow{oldsymbol{H}}_{t-1} oldsymbol{W}_{hh}^{(f)} + oldsymbol{b}_h^{(f)})\
overleftarrow{oldsymbol{H}}_t &= phi(oldsymbol{X}_t oldsymbol{W}_{xh}^{(b)} + overleftarrow{oldsymbol{H}}_{t+1} oldsymbol{W}_{hh}^{(b)} + oldsymbol{b}_h^{(b)}) end{aligned} ]
[oldsymbol{H}_t=(overrightarrow{oldsymbol{H}}_{t}, overleftarrow{oldsymbol{H}}_t)
]
[oldsymbol{O}_t = oldsymbol{H}_t oldsymbol{W}_{hq} + oldsymbol{b}_q
]