Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network.
Before: a discrete sequence of hidden layers.
After: the derivative of the hidden state.
Traditional methods: residual networks, RNN decoders, and normalizing flows build complicated transformations by composing a sequence of transformations to a hidden state.
we parameterize the continuous dynamics of hidden units using an ordinary differential equation (ODE) 常微分函数.
将h(t) 看作一个函数,可以用一个neural network学习h(t)的分布,然后输入层h(0) ----> 输出层h(T);