卷积神经网络Convolutional Neural Networks

zoukankan html css js c++ java

卷积神经网络Convolutional Neural Networks
In the previous post, we figured out how to do forward and backward propagation to compute the gradient for fully-connected neural networks, and used those algorithms to derive the Hessian-vector product algorithm for a fully connected neural network.

Next, let's figure out how to do the exact same thing for convolutional neural networks. While the mathematical theory should be exactly the same, the actual derivation will be slightly more complex due to the architecture of convolutional neural networks.
Convolutional Neural Networks

First, let's go over out convolutional neural network architecture. There are several variations on this architecture; the choices we make are fairly arbitrary. However, the algorithms will be very similar for all variations, and their derivations will look very similar.

A convolutional neural network consists of several layers. These layers can be of three types:

Convolutional: Convolutional layers consist of a rectangular grid of neurons. It requires that the previous layer also be a rectangular grid of neurons. Each neuron takes inputs from a rectangular section of the previous layer; the weights for this rectangular section are the same for each neuron in the convolutional layer. Thus, the convolutional layer is just an image convolution of the previous layer, where the weights specify the convolution filter.

In addition, there may be several grids in each convolutional layer; each grid takes inputs from all the grids in the previous layer, using potentially different filters.

Max-Pooling: After each convolutional layer, there may be a pooling layer. The pooling layer takes small rectangular blocks from the convolutional layer and subsamples it to produce a single output from that block. There are several ways to do this pooling, such as taking the average or the maximum, or a learned linear combination of the neurons in the block. Our pooling layers will always be max-pooling layers; that is, they take the maximum of the block they are pooling.

Fully-Connected: Finally, after several convolutional and max pooling layers, the high-level reasoning in the neural network is done via fully connected layers. A fully connected layer takes all neurons in the previous layer (be it fully connected, pooling, or convolutional) and connects it to every single neuron it has. Fully connected layers are not spatially located anymore (you can visualize them as one-dimensional), so there can be no convolutional layers after a fully connected layer.

The resulting neural network will look like this (LeNet):

Note that we are not really constrained to two-dimensional convolutional neural networks. We can in the exact same way build one- or three- dimensional convolutional neural networks; our filters will just become appropriately dimensioned, and our pooling layers will change dimension as well. We may, for instance, want to use one-dimensional convolutional nets on audio or three-dimensional nets on MRI data.

Now that we've described the structure of our neural network, let's work through forward and backward propagation to do prediction and gradient computations in these neural networks.
Forward Propagation

Our neural networks now have three types of layers, as defined above. The forward and backward propagations will differ depending on what layer we're propagating through. We've already talked about fully connected networks in the previous post, so we'll just look at the convolutional layers and the max-pooling layers.

Convolutional Layers

Suppose that we have some
$x ℓ i j = \sum a = 0 m - 1 \sum b = 0 m - 1 ω a b y ℓ - 1$
This is just a convolution, which we can express in Matlab via

conv2(x, w, 'valid')

Then, the convolutional layer applies its nonlinearity:
$y ℓ i j = σ (x ℓ i j) .$
Max-Pooling Layers

The max-pooling layers are quite simple, and do no learning themselves. They simply take some
Backward Propagation

Next, let's derive the backward propagation algorithms for these two layer types.

Convolutional Layers

Let's assume that we have some error function,

Note that the error we know and that we need to compute for the previous layer is the partial of
$\partial E \partial ω a b = \sum i = 0 N - m \sum j = 0 N - m \partial E \partial x$
In this case, we must sum over all

In order to compute the gradient, we need to know the values
$\partial E \partial x ℓ i j = \partial E \partial y ℓ i j \partial y ℓ i j \partial x ℓ$
As we can see, since we already know the error at the current layer

In addition to compute the weights for this convolutional layer, we need to propagate errors back to the previous layer. We can once more use the chain rule:
$\partial E \partial y ℓ - 1 i j = \sum a = 0 m - 1 \sum b = 0 m - 1 \partial$
Looking back at the forward propagation equations, we can tell that

Max-Pooling Layers

As noted earlier, the max-pooling layers do not actually do any learning themselves. Instead, then reduce the size of the problem by introducing sparseness. In forward propagation,

Conclusion

Convolutional neural networks are an architecturally different way of processing dimensioned and ordered data. Instead of assuming that the location of the data in the input is irrelevant (as fully connected layers do), convolutional and max pooling layers enforce weight sharing translationally. This models the way the human visual cortex works, and has been shown to work incredibly well for object recognition and a number of other tasks. We can learn convolutional networks through traditional stochastic gradient descent; in addition, we could apply the

from: http://andrew.gibiansky.com/blog/machine-learning/convolutional-neural-networks/
查看全文

相关阅读:
CentOS7安装MySQL5.7
.gdbinit文件配置
 Linux 动态库加载
 GDB常用调试命令（二）
git删除缓存区中文件
 git添加空文件夹
 Linux 打开core dump功能
 C++ 预处理器
 C++ 模板
 C++ 命名空间

原文地址：https://www.cnblogs.com/GarfieldEr007/p/5326855.html

卷积神经网络Convolutional Neural Networks

Convolutional Neural Networks

Forward Propagation

Convolutional Layers

Max-Pooling Layers

Backward Propagation

Convolutional Layers

Max-Pooling Layers

Conclusion