文本生成模型
序列模型
问题
对于一个序列预测问题:
(1)输入的时间变化序列:
(2)在t时刻通过模型预测下一时刻,即:
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134348033-919566817.png)
难点
(1)内部状态难以建模、观察
(2)长时间窗口内的状态难以建模、观察
建模思路
(1)引入内部的隐状态变量
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134348830-464264445.png)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134349626-1812517434.png)
simple RNN
rnn的基本结构如下:
前向传播
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134350986-1690402749.png)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134351345-1534099780.png)
其中:
(1)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134351658-1899954480.png)
(2)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134351970-575969197.png)
(3)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134352376-1314540119.png)
(4)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134352830-1685813872.png)
代价函数:
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134353314-1507685490.png)
模型的参数:
(1):将向量从hidden_dim变换到hidden_dim
(2):将向量从input_dim变换到hidden_dim
(3):将向量从hidden_dim变换到output_dim
(4):bias向量
模型训练:BPTT (back propagation through time)
bptt算法的基本思想是:把所有时刻的误差累加起来,成为一个梯度。
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134357158-305918786.png)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134357830-968515925.png)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134358298-2072829083.png)
其中:
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134358798-1256401128.png)
从这个迭代式子里可以看到,每个时刻的梯度由当前时刻前的一系列时刻决定
梯度消失现象
对于sigmoid函数,当值接近0或1时,梯度接近0,梯度消失
LSTM cell
前向传播
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134359314-2023266602.png)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134359876-1260821149.png)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134400423-1989695037.png)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134401158-480779537.png)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134401970-808974150.png)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134402517-675795274.png)
Encoder-Decoder Framework
基本框架
(1)Encoder对输入序列进行编码,即
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134403720-1344045739.png)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134405048-134193960.png)
其中:
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134405689-1817872601.png)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134406767-714744669.png)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134407220-1411286296.png)
(2)Decoder的作用是给定Encoder的输出向量
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134407767-2064913914.png)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134408205-772487238.png)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134409033-1854314423.png)
对于rnn模型,每个条件概率有:
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134409548-1276028678.png)
其中,
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134409845-372014579.png)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134410126-1716837734.png)
attention机制
区别:对传统的Decoder进行调整,引入context vector,也即
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134411064-587880860.png)
其中每个
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134411720-1938621023.png)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134412142-54039865.png)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134412673-2064312359.png)
其中
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134413033-899809951.png)
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134413533-2083958251.png)
称
![](https://images2015.cnblogs.com/blog/494740/201704/494740-20170414134413970-1645390001.png)