zoukankan      html  css  js  c++  java
  • Deep Generative Video Compression(NIPS 2019)

    Based on VAE

    Steps:

    1. Transform a sequence of frames (x_{1:T}=(x_1,...,x_T)) to a sequence of latent states (z_{1:T}) and optionally a global state (f). This transformation is lossy, but the video is not optimally compressed as there are still correlations in the latent space variables.
    2. So the latent space must be entropy coded into binary.
    3. The bit stream can then be sent to a receiver where it is decoded into video frames.

    (Q: 为什么要先变成latent variables? 直接entropy coding不行吗?)

    So we need two models:

    1. optimal lossy transformation into the latent space.
    2. predictive model required for entropy coding.

    Temporal model is most important for videos, because video exhibit strong temporal correlations in addition to the spatial correlations present in images.

    So we propose to learn a temporally-conditioned prior distribution parameterized by a deep generative model to efficiently code the latent variables associated with each frame.

    Notation:
    (x_{1:T}=(x_1,...,x_T)=)video sequence, (z_{1:T}=)associated latent variables, (f=)global variables(optionally)

    Arithmetic coding:
    Coding the entire sequence of discretized latent states (z_{1:T}) into a single number. Use conditional probabilities (p(z_t|z_{<t})) to iteratively refine the real number interval ([0,1)) into a progressively smaller interval. (Q: 具体怎么调整的?) After a final (very small) interval is obtained, a binarized floating point number from the final interval is stored to encode the entire sequence of latents.

    Decoder: f(latent)=data
    Use a stochastic recurrent variational autoencoder(随机循环变分自编码器) that transforms a sequence of local latent variables (z_{1:T}) and a global state (f) into the frame sequence (x_{1:T})

    [p_ heta(x_{1:T},z_{1:T},f)=p_ heta(f)p_ heta(z_{1:T})prod_{t=1}^{T}p_ heta(x_t|z_t,f) ]

    [p_ heta(x_t|z_t,f)=Laplace(mu_ heta(z_t,f),lambda^{-1}1);(frame;likelihood)\ widetilde{x}_t=mu_ heta(z_t,f)=decoder;mean ]

    Encoder:
    Use amortized variational inference(平摊变分推理) to predict a distribution over latent codes given the input video.

    [q_phi(z_{1:T},f|x_{1:T})=q_phi(f|x_{1:T})prod_{t=1}^{T}q_phi(z_t|x_t) ]

    采用以均值为中心的固定宽度均匀分布:

    [widetilde{f} sim q_{phi}(f|x_{1:T})=mathcal{U}(hat{f}-frac{1}{2},hat{f}+frac{1}{2}) \ widetilde{z}_t sim q_{phi}(z_t|x_t)=mathcal{U}(hat{z}_t-frac{1}{2},hat{z}_t+frac{1}{2}) ]

    均值通过附加的编码器神经网络得到:

    [hat{f}=mu_{phi}(x_{1:T}) \ hat{z}_t=mu_{phi}(x_t) ]

    The mean for the global state is parametrized by convolutions over (x_{1:T}), followed by a bi-directional LSTM which is then processed by a MLP.

    The encoder mean for the local state is simpler, consisting of convolutions over each frame followed by a MLP.

    论文中假设全局先验(p_ heta(f))是固定的,而(p_ heta(z_{1:T}))由时间序列模型组成

    [p_ heta(f)=prod_{i}^{dim(f)}p_ heta(f^i)*mathcal{U}(-frac{1}{2}, frac{1}{2}) \ p_ heta(z_{1:T})=prod_{t}^{T} prod_{i}^{dim(z)}p_ heta(z_t^i|z_{<t})*mathcal{U}(-frac{1}{2}, frac{1}{2}) ]

    有两种方法可以对潜变量序列(z_{1:T})建模:

    1. A recurrent LSTM prior architecture for (p_ heta(z_t^i|z_{<t})) which conditions on all previous frames in a segment.
    2. 单帧上下文, (p_ heta(z_t^i|z_{<t})=p_ heta(z_t^i|z_{t-1})), 本质上是一个Kalman filter

    通过最大化(eta-VAE)目标函数,可以联合学习encoder(变分模型)和decoder(生成模型)

    [L(phi, heta)=mathbb{E}_{widetilde{f},widetilde{z}_{1:T} sim q_{phi}} [log{p_ heta}(x_{1:T}|widetilde{f},widetilde{z}_{1:T})] +eta mathbb{E}_{widetilde{f},widetilde{z}_{1:T} sim q_{phi}} [log{p_ heta}(widetilde{f},widetilde{z}_{1:T})] ]

    第一项表示失真,第二项表示近似后验和先验的交叉熵

    模型:

  • 相关阅读:
    java反射
    2018 Multi-University Training Contest 4 Problem B. Harvest of Apples 【莫队+排列组合+逆元预处理技巧】
    逆元小结
    2018 Multi-University Training Contest 1 Distinct Values 【贪心 + set】
    Codeforces Round #533 (Div. 2) C. Ayoub and Lost Array 【dp】
    Codeforces Round #533 (Div. 2) B. Zuhair and Strings 【模拟】
    2018 Multi-University Training Contest 3 Problem F. Grab The Tree 【YY+BFS】
    ZOJ Monthly, January 2019 Little Sub and his Geometry Problem 【推导 + 双指针】
    ZOJ Monthly, January 2019 Little Sub and Isomorphism Sequences 【离线离散化 + set + multiset】
    EOJ Monthly 2019.1 唐纳德先生与这真的是签到题吗 【数学+暴力+multiset】
  • 原文地址:https://www.cnblogs.com/hhhhhxh/p/13198639.html
Copyright © 2011-2022 走看看