【神经网络与深度学习】如何在Caffe中配置每一个层的结构

zoukankan html css js c++ java

【神经网络与深度学习】如何在Caffe中配置每一个层的结构
如何在Caffe中配置每一个层的结构
最近刚在电脑上装好Caffe，由于神经网络中有不同的层结构，不同类型的层又有不同的参数，所有就根据Caffe官网的说明文档做了一个简单的总结。

1. Vision Layers

1.1 卷积层(Convolution)

类型：CONVOLUTION

例子

layers { name: "conv1" type: CONVOLUTION bottom: "data" top: "conv1" blobs_lr: 1 # learning rate multiplier for the filters blobs_lr: 2 # learning rate multiplier for the biases weight_decay: 1 # weight decay multiplier for the filters weight_decay: 0 # weight decay multiplier for the biases convolution_param { num_output: 96 # learn 96 filters kernel_size: 11 # each filter is 11x11 stride: 4 # step 4 pixels between each filter application weight_filler { type: "gaussian" # initialize the filters from a Gaussian std: 0.01 # distribution with stdev 0.01 (default mean: 0) } bias_filler { type: "constant" # initialize the biases to zero (0) value: 0 } } }

blobs_lr: 学习率调整的参数，在上面的例子中设置权重学习率和运行中求解器给出的学习率一样，同时是偏置学习率为权重的两倍。

weight_decay：

卷积层的重要参数

必须参数：

num_output (c_o)：过滤器的个数

kernel_size (or kernel_h and kernel_w)：过滤器的大小

可选参数：

weight_filler [default type: 'constant' value: 0]：参数的初始化方法

bias_filler：偏置的初始化方法

bias_term [default true]：指定是否是否开启偏置项

pad (or pad_h and pad_w) [default 0]：指定在输入的每一边加上多少个像素

stride (or stride_h and stride_w) [default 1]：指定过滤器的步长

group (g) [default 1]: If g > 1, we restrict the connectivityof each filter to a subset of the input. Specifically, the input and outputchannels are separated into g groups, and the ith output group channels will beonly connected to the ith input group channels.

通过卷积后的大小变化：

输入：n * c_i * h_i * w_i

输出：n * c_o * h_o * w_o，其中h_o = (h_i + 2 * pad_h - kernel_h) /stride_h + 1，w_o通过同样的方法计算。

1.2 池化层（Pooling）

类型：POOLING

例子

layers { name: "pool1" type: POOLING bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 3 # pool over a 3x3 region stride: 2 # step two pixels (in the bottom blob) between pooling regions } }

卷积层的重要参数

必需参数：

kernel_size (or kernel_h and kernel_w)：过滤器的大小

可选参数：
pool [default MAX]：pooling的方法，目前有MAX, AVE, 和STOCHASTIC三种方法

pad (or pad_h and pad_w) [default 0]：指定在输入的每一遍加上多少个像素

stride (or stride_h and stride_w) [default1]：指定过滤器的步长

通过池化后的大小变化：

输入：n * c_i * h_i * w_i

输出：n * c_o * h_o * w_o，其中h_o = (h_i + 2 * pad_h - kernel_h) /stride_h + 1，w_o通过同样的方法计算。

1.3 Local Response Normalization (LRN)

类型：LRN

Local ResponseNormalization是对一个局部的输入区域进行的归一化（激活a被加一个归一化权重（分母部分）生成了新的激活b），有两种不同的形式，一种的输入区域为相邻的channels（cross channel LRN），另一种是为同一个channel内的空间区域（within channel LRN）

计算公式：对每一个输入除以

可选参数：

local_size [default 5]：对于cross channel LRN为需要求和的邻近channel的数量；对于within channel LRN为需要求和的空间区域的边长

alpha [default 1]：scaling参数

beta [default 5]：指数

norm_region [default ACROSS_CHANNELS]: 选择哪种LRN的方法ACROSS_CHANNELS 或者WITHIN_CHANNEL

2. Loss Layers

深度学习是通过最小化输出和目标的Loss来驱动学习。

2.1 Softmax
类型: SOFTMAX_LOSS

2.2 Sum-of-Squares / Euclidean

类型: EUCLIDEAN_LOSS

2.3 Hinge / Margin
类型: HINGE_LOSS
例子：

# L1 Norm layers { name: "loss" type: HINGE_LOSS bottom: "pred" bottom: "label" } # L2 Norm layers { name: "loss" type: HINGE_LOSS bottom: "pred" bottom: "label" top: "loss" hinge_loss_param { norm: L2 } }

可选参数：

norm [default L1]: 选择L1或者 L2范数

输入：

n * c * h * wPredictions

n * 1 * 1 * 1Labels

输出

1 * 1 * 1 * 1Computed Loss

2.4 Sigmoid Cross-Entropy
类型：SIGMOID_CROSS_ENTROPY_LOSS

2.5 Infogain
类型：INFOGAIN_LOSS

2.6 Accuracy and Top-k
类型：ACCURACY
用来计算输出和目标的正确率，事实上这不是一个loss，而且没有backward这一步。

3. 激励层（Activation / Neuron Layers）
一般来说，激励层是element-wise的操作，输入和输出的大小相同，一般情况下就是一个非线性函数。

3.1 ReLU / Rectified-Linear and Leaky-ReLU
类型: RELU
例子:

layers { name: "relu1" type: RELU bottom: "conv1" top: "conv1" }

可选参数：

negative_slope [default 0]:指定输入值小于零时的输出。

ReLU是目前使用做多的激励函数，主要因为其收敛更快，并且能保持同样效果。

标准的ReLU函数为max(x, 0)，而一般为当x > 0时输出x，但x <= 0时输出negative_slope。RELU层支持in-place计算，这意味着bottom的输出和输入相同以避免内存的消耗。

3.2 Sigmoid
类型: SIGMOID
例子:

layers { name: "encode1neuron" bottom: "encode1" top: "encode1neuron" type: SIGMOID }
SIGMOID 层通过 sigmoid(x) 计算每一个输入x的输出，函数如下图。

3.3 TanH / Hyperbolic Tangent
类型: TANH
例子:

layers { name: "encode1neuron" bottom: "encode1" top: "encode1neuron" type: SIGMOID }

TANH层通过 tanh(x) 计算每一个输入x的输出，函数如下图。

3.3 Absolute Value
类型: ABSVAL
例子:

layers { name: "layer" bottom: "in" top: "out" type: ABSVAL }
ABSVAL层通过 abs(x) 计算每一个输入x的输出。

3.4 Power
类型: POWER
例子：

layers { name: "layer" bottom: "in" top: "out" type: POWER power_param { power: 1 scale: 1 shift: 0 } }

可选参数：
power [default 1]
scale [default 1]
shift [default 0]
POWER层通过 (shift + scale * x) ^ power计算每一个输入x的输出。

3.5 BNLL
类型: BNLL
例子：

layers { name: "layer" bottom: "in" top: "out" type: BNLL }

BNLL (binomial normal log likelihood) 层通过 log(1 + exp(x)) 计算每一个输入x的输出。

4. 数据层（Data Layers）

数据通过数据层进入Caffe，数据层在整个网络的底部。数据可以来自高效的数据库（LevelDB 或者 LMDB），直接来自内存。如果不追求高效性，可以以HDF5或者一般图像的格式从硬盘读取数据。

4.1 Database

类型：DATA

必须参数：

source:包含数据的目录名称

batch_size:一次处理的输入的数量

可选参数：

rand_skip:在开始的时候从输入中跳过这个数值，这在异步随机梯度下降（SGD）的时候非常有用

backend [default LEVELDB]: 选择使用 LEVELDB 或者 LMDB

4.2 In-Memory
类型: MEMORY_DATA
必需参数：
batch_size, channels, height, 指定从内存读取数据的大小
The memory data layer reads data directly from memory, without copying it. In order to use it, one must call MemoryDataLayer::Reset (from C++) or Net.set_input_arrays (from Python) in order to specify a source of contiguous data (as 4D row major array), which is read one batch-sized chunk at a time.

4.3 HDF5 Input
类型: HDF5_DATA
必要参数：
source:需要读取的文件名
batch_size：一次处理的输入的数量

4.4 HDF5 Output
类型: HDF5_OUTPUT
必要参数：
file_name: 输出的文件名
HDF5的作用和这节中的其他的层不一样，它是把输入的blobs写到硬盘

4.5 Images
类型: IMAGE_DATA
必要参数：
source: text文件的名字，每一行给出一张图片的文件名和label
batch_size: 一个batch中图片的数量
可选参数：
rand_skip：在开始的时候从输入中跳过这个数值，这在异步随机梯度下降（SGD）的时候非常有用
shuffle [default false]

new_height, new_ 把所有的图像resize到这个大小

4.6 Windows
类型：WINDOW_DATA

4.7 Dummy
类型：DUMMY_DATA
Dummy 层用于development 和debugging。具体参数DummyDataParameter。

5. 一般层（Common Layers）

5.1 全连接层Inner Product
类型：INNER_PRODUCT
例子：
layers { name: "fc8" type: INNER_PRODUCT blobs_lr: 1 # learning rate multiplier for the filters blobs_lr: 2 # learning rate multiplier for the biases weight_decay: 1 # weight decay multiplier for the filters weight_decay: 0 # weight decay multiplier for the biases inner_product_param { num_output: 1000 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } bottom: "fc7" top: "fc8" }

必要参数：

num_output (c_o)：过滤器的个数

可选参数：

weight_filler [default type: 'constant' value: 0]：参数的初始化方法

bias_filler：偏置的初始化方法

bias_term [default true]：指定是否是否开启偏置项

通过全连接层后的大小变化：

输入：n * c_i * h_i * w_i

输出：n * c_o * 1 *1

5.2 Splitting
类型：SPLIT
Splitting层可以把一个输入blob分离成多个输出blobs。这个用在当需要把一个blob输入到多个输出层的时候。

5.3 Flattening
类型：FLATTEN
Flattening是把一个输入的大小为n * c * h * w变成一个简单的向量，其大小为 n * (c*h*w) * 1 * 1。

5.4 Concatenation
类型：CONCAT

例子：
layers { name: "concat" bottom: "in1" bottom: "in2" top: "out" type: CONCAT concat_param { concat_dim: 1 } }

可选参数：

concat_dim [default 1]：0代表链接num，1代表链接channels

通过全连接层后的大小变化：

输入：从1到K的每一个blob的大小n_i * c_i * h * w

输出：

如果concat_dim = 0: (n_1 + n_2 + ... + n_K) *c_1 * h * w，需要保证所有输入的c_i 相同。

如果concat_dim = 1: n_1 * (c_1 + c_2 + ... +c_K) * h * w，需要保证所有输入的n_i 相同。

通过Concatenation层，可以把多个的blobs链接成一个blob。

5.5 Slicing
The SLICE layer is a utility layer that slices an input layer to multiple output layers along a given dimension (currently num or channel only) with given slice indices.

5.6 Elementwise Operations
类型：ELTWISE

5.7 Argmax
类型：ARGMAX

5.8 Softmax
类型：SOFTMAX

5.9 Mean-Variance Normalization

类型：MVN

6. 参考
Caffe
查看全文

相关阅读:
颜色混合openglglBlendFunc函数
 Types of Maps
ogre 检测显卡gpu支持参数
 (转载)lua和c/c++互相调用实例分析
 光照模型
 阴影(转载)
eval()解析JSON
android中The connection to adb is down，问题和解决 AndroidEclipseAntXML
比较android中的像素单位dp、px、pt、sp
区别：DOM Core 与 HTMLDOM

原文地址：https://www.cnblogs.com/huty/p/8517962.html

【神经网络与深度学习】如何在Caffe中配置每一个层的结构

1. Vision Layers

1.1 卷积层(Convolution)

1.2 池化层（Pooling）

1.3 Local Response Normalization (LRN)

2. Loss Layers

2.1 Softmax

2.2 Sum-of-Squares / Euclidean

2.3 Hinge / Margin

2.4 Sigmoid Cross-Entropy

2.5 Infogain

2.6 Accuracy and Top-k

3. 激励层（Activation / Neuron Layers）

3.1 ReLU / Rectified-Linear and Leaky-ReLU

3.2 Sigmoid

3.3 TanH / Hyperbolic Tangent

3.3 Absolute Value

3.4 Power

3.5 BNLL

4. 数据层（Data Layers）

4.1 Database

4.2 In-Memory

4.3 HDF5 Input

4.4 HDF5 Output

4.5 Images

4.6 Windows

4.7 Dummy

5. 一般层（Common Layers）

5.1 全连接层Inner Product

5.2 Splitting

5.3 Flattening

5.4 Concatenation

5.5 Slicing

5.6 Elementwise Operations

5.7 Argmax

5.8 Softmax

5.9 Mean-Variance Normalization

6. 参考