【解释】
It is appropriate when every input should be matched to an output.
【解释】
in a language model we try to predict the next step based on the knowledge of all prior steps.
【解释】
Γu is a vector of dimension equal to the number of hidden units in the LSTM.
【解释】
For the signal to backpropagate without vanishing, we need c<t> to be highly dependant on c<t−1>.