zoukankan      html  css  js  c++  java
  • 【ML】ICLR2016_Delving Deeper into Convolutional Networks

    ICLR2016_DELVING DEEPER INTO CONVOLUTIONAL NETWORKS

    Note here: Ballas recently proposed a novel framework on learning video representation, following is the review note after reading his paper.

    Link: http://arxiv.org/pdf/1511.06432v4.pdf

    [Brief introduction to some neural networks]

    CNN: excellent in static image classification

    RNN: can understand temporal sequences in various learning tasks
    (however, with exploding or vanishing weights problem)
    ---> LSTM/GRU are proposed to avoid this problem

    RCN: leverage properties from both CNN and RNN, use CNN top level feature map as input of RNN, it has recently introduced to learn video representations.


    [Video reprensentation]

    Mmotivation:
    Adopt RCN as basic model.
    - Top-level feature map presents high sementic features, namely the spatial naunces are ignored after pooling.
    - However, frame-to-frame temporal variation is known to be smooth, which is the key for action recognition from videos.
    (we need a new model to adapt this problem)

    [Proposed models]

    GRU-RCN:
    - replace recurrent units in RCN with GRU.

    (z: activation gate, decides to what degree previous hidden state would contribute to the next hidden state)
    (r: reset gate, decides whether or not last hidden state should be propagated into next state)
    (~h: candidate hidden state, it'll pass through the activatin gate)
    (h: final hidden state)

    Problems:
    - number of parameters in fully-connected layer is huge due to size of conv map.
    - fully-connected layers break the spatial structure of conv map.

    Trick:
    - replace the fully-connected units in GRU with convolution operations, which can keep spatial structure and reduce number of parameters meanwhile.

    Intuition:
    - we can see the propagation of hidden states as a process of convolution.
    if so, the next hidden state percepts spatial structure of all the previous states. as the sequence goes further, the receptive field on previous states are larger, and we only get a general concept of frames in the beginning.
    - compare to our cognition system, it does make sense!


    Stacked GRU-RCN:
    - it applies L GRU-RCNs independently on each convolutional map.
    - tile up L GRU-RCNs.
    - feed L final time-step hidden states into a classifier.

  • 相关阅读:
    求解幻方问题
    Internet 信息服务承载说明 即IIS安装说明
    Internet Explorer 8 使用技巧(1):兼容性视图
    .NET Framewok 3.5 中 JSON 序列化和反序列化的简单实现
    ASP.NET 检测远程URL是否存在 方法参考
    C#操作Excel表格数据
    TransactSQL编程规范(转)
    如何控制数据集字段被引用的所有控件的Visible、Enabled、ReadOnly
    销量排名,使用临时表,不使用函数和变量情况下的实现
    如何使用SQL实现排名
  • 原文地址:https://www.cnblogs.com/kanelim/p/5279319.html
Copyright © 2011-2022 走看看