zoukankan      html  css  js  c++  java
  • 卷积神经网络是如何学习到平移不变的特征

    After some thought, I do not believe that pooling operations are responsible for the translation invariant property in CNNs. I believe that invariance (at least to translation) is due to the convolution filters (not specifically the pooling) and due to the fully-connected layer.

    For instance, let's use the Fig. 1 as reference:

    The blue volume represents the input image, while the green and yellow volumes represent layer 1 and layer 2 output activation volumes (see CS231n Convolutional Neural Networks for Visual Recognition if you are not familiar with these volumes). At the end, we have a fully-connected layer that is connected to all activation points of the yellow volume.

    These volumes are build using a convolution plus a pooling operation. The pooling operation reduces the height and width of these volumes, while the increasing number of filters in each layer increases the volume depth.

    For the sake of the argument, let's suppose that we have very "ludic" filters, as show in Fig. 2:

    • the first layer filters (which will generate the green volume) detect eyes, noses and other basic shapes (in real CNNs, first layer filters will match lines and very basic textures);
    • The second layer filters (which will generate the yellow volume) detect faces, legs and other objects that are aggregations of the first layer filters. Again, this is only an example: real life convolution filters may detect objects that have no meaning to humans.

    Now suppose that there is a face at one of the corners of the image (represented by two red and a magenta point). The two eyes are detected by the first filter, and therefore will represent two activations at the first slice of the green volume. The same happens for the nose, except that it is detected for the second filter and it appears at the second slice. Next, the face filter will find that there are two eyes and a nose next to each other, and it generates an activation at the yellow volume (within the same region of the face at the input image). Finally, the fully-connected layer detects that there is a face (and maybe a leg and an arm detected by other filters) and it outputs that it has detected an human body.

    Now suppose that the face has moved to another corner of the image, as shown in Fig. 3:

    The same number of activations occurs in this example, however they occur in a different region of the green and yellow volumes. Therefore, any activation point at the first slice of the yellow volume means that a face was detected, INDEPENDENTLY of the face location. Then the fully-connected layer is responsible to "translate" a face and two arms to an human body. In both examples, an activation was received at one of the fully-connected neurons. However, in each example, the activation path inside the FC layer was different, meaning that a correct learning at the FC layer is essential to ensure the invariance property.

    It must be noticed that the polling operation only "compresses" the activation volumes, if there was no polling in this example, an activation at the first slice of the yellow volume would still mean a face.

    In conclusion, what makes a CNN invariant to object translation is the architecture of the neural network: the convolution filters and the fully-connected layer. Additionally, I believe that if a CNN is trained showing faces only at one corner, during the learning process, the fully-connected layer may become insensitive to faces in other corners.

     

    source:

    https://www.quora.com/How-is-a-convolutional-neural-network-able-to-learn-invariant-features/answer/Jean-Da-Rolt

     

    An Intuitive Explanation of Convolutional Neural Networks

  • 相关阅读:
    谈谈泛型和锁,一个值得注意的问题!
    关于++运算符重载的一个问题,有点“饶”!
    关于抽象类的构造函数!
    在嵌套类中是否可以触发外部类中定义的事件!
    谈谈C#的私有成员的一个有趣的现象!
    关于循环引用!
    谈谈常数字段!
    C#中对byte类型的处理。
    C#l编译器是否会为值类型生成默认的构造函数!
    谈谈DivideByZeroException异常!并非像表面那么简单!
  • 原文地址:https://www.cnblogs.com/jukan/p/8779188.html
Copyright © 2011-2022 走看看