1.引言
卷积神经网络(Convolutional Neural Networks , CNN)受到视网膜上的细胞只对视野范围内的部分区域敏感,这一部分区域称为感受域(receptive field).卷积神经网络正是采用了这种机制,每一个神经元只与一部分输入相连接。
2.稀疏连接
CNNs通过局部连接的方式揭示了空间中的局部相关性。在 $m$ 层的隐单元的输入来自于 $m-1$ 层的一部分单元的加权和,这一部分单元在空间上是连续的感受域。如下图:
可以把 $m-1$ 层想象成视网膜输入。$m$ 层的单元的感受域的宽度均为3,因此只与视网膜层的 3 个相邻的神经元相连接。$m+1$ 层的单元与其下面一层的连接方式也是如此。每一个神经元对不在感受域范围内的变化是没有反应的,所以上面的结构保证学习出一种“滤波器“,使其对局部空间的输入模式产生强烈的反应。
但是,正如上面图中所示,把许多这样的滤波器层层级联,局部感知逐渐变得全局感知,$m$ 层的每一个单元只对部分输入感知,而 $m+1$ 层的单元又将 $m$ 层的感知结果综合起来从而形成对输入层全部的一个感知,所以$m+1$隐层单元可以看作是对宽度为5的特征的一个非线性编码。
3.共享权重(Shared Weights)
在CNNs,每个滤波器 $h_{i}$ 重复地逐步横跨整个输入层。重复的单元共享参数(权重向量和偏置),从而形成一幅特征图。
在上图中,3个隐层单元属于同一幅特征图,一样颜色的权重值是共享的,即相等的。
滤波器通过这种方式使得图像中可视层中任意位置的特征都能被检测出来,权重共享大大减少了需要学习的参数的数量。
4.细节和符号
通过重复地把一个函数运用到整个图像的子区域可以得到一幅特征图,即用一个线性滤波器对图像进行卷积操作,加上偏置项,然后再采用一个非线性函数。如果用 $h^{k}$ 表示第 $k$ 幅特征图,其对应的滤波器由 $W^{k}$ 和偏置 $b_{k}$ 决定, 那么特征图 $h^{k}$ 可以由下计算得到(采用 tanh 作为非线性函数):
$h_{ij}^{k}=tanh((W^{k}*x)_{ij}+b_{k}$
为了得到对数据更加丰富的表示,通常每个隐层都由多幅特征图组成:${h^{ ext{(k)}},k=0,...K}$.权重 $W$ 由一个4维的张量表示, 4各维度分别表示:目的特征图,源特征图,源特征图的垂直坐标,源特征图的水平坐标。偏置 $b$ 由一个向量表示,其中每一个元素是每一个目标特征图对应的偏置。可以表示如下:
在上图中 $W_{ij}^{kl}$ 表示在 $m-1$ 层的第 $k$ 幅特征图的每一个像素 与第 $m$ 层的第 $l$ 幅特征图的像素 $(i,j)$ 之间的连接权重。
5.卷积操作
卷积操作(Convolution operation,ConvOp)在theano中是通过theano.tensor.signal.conv2d实现的,它需要两个输入:
- 输入图像的部分子集对应的一个4阶张量,该张量的每一维分别表示:子集的大小,输入特征图的编号,图像的高度,图像的宽度
- 表示权重矩阵 $W$ 的一个4阶张量,每一维分别表示:在 $m$ 的特征图像的编号,$m-1$ 层特征图像的编号,滤波器的高度,滤波器的宽度
这里还要介绍一个在下面代码中将要用到的一个函数 dimshuffle(*pattern):
例如dimshuffle('x', 2, 'x', 0, 1),就是将原来3阶张量扩展为5阶张量,新张量的第0维和第2维为0,而第1维,第3维和第4维分别由原来3阶张量的第2维,第0维和第1维映射而来。
如果原来张量的形状为(20,30,40),通过dimshuffle('x', 2, 'x', 0, 1)之后,形状变为(1,40,1,20,30)
dimshuffle(0, 1) -> 和原来一样
dimshuffle(1, 0) -> 交换第1维和第0维的数据
更多详细资料参看:dimshuffle
下面用到的图片3wolfmoon
下面对输入是3 幅RGB 特征图,进行卷积操作,并输出卷积前后的对比图:
1 # -*- coding: utf-8 -*- 2 """ 3 Created on Tue Apr 28 10:22:14 2015 4 5 @author: ZengJiulin 6 """ 7 8 import theano 9 from theano import tensor as T 10 from theano.tensor.nnet import conv 11 import pylab 12 from PIL import Image 13 import numpy 14 15 rng = numpy.random.RandomState(23455) 16 17 # instantiate 4D tensor for input 18 input = T.tensor4(name='input',dtype='float64') 19 20 # initialize shared variable for weights. 21 # 输出的特征图 2 幅 22 # 输入的特征图 3 幅 23 # 滤波器的大小 9*9 24 w_shp = (2, 3, 9, 9) 25 w_bound = numpy.sqrt(3 * 9 * 9) 26 W = theano.shared( numpy.asarray( 27 rng.uniform( 28 low=-1.0 / w_bound, 29 high=1.0 / w_bound, 30 size=w_shp), 31 dtype=input.dtype), name ='W') 32 33 # initialize shared variable for bias (1D tensor) with random values 34 # IMPORTANT: biases are usually initialized to zero. However in this 35 # particular application, we simply apply the convolutional layer to 36 # an image without learning the parameters. We therefore initialize 37 # them to random values to "simulate" learning. 38 # 输出的特征图有 2 幅,所以偏置向量的元素个数同样为 2 39 b_shp = (2,) 40 b = theano.shared(numpy.asarray( 41 rng.uniform(low=-.5, high=.5, size=b_shp), 42 dtype=input.dtype), name ='b') 43 44 # build symbolic expression that computes the convolution of input with filters in w 45 conv_out = conv.conv2d(input, W) 46 47 # build symbolic expression to add bias and apply activation function, i.e. produce neural net layer output 48 # A few words on ``dimshuffle`` : 49 # ``dimshuffle`` is a powerful tool in reshaping a tensor; 50 # what it allows you to do is to shuffle dimension around 51 # but also to insert new ones along which the tensor will be 52 # broadcastable; 53 # dimshuffle('x', 2, 'x', 0, 1) 54 # This will work on 3d tensors with no broadcastable 55 # dimensions. The first dimension will be broadcastable, 56 # then we will have the third dimension of the input tensor as 57 # the second of the resulting tensor, etc. If the tensor has 58 # shape (20, 30, 40), the resulting tensor will have dimensions 59 # (1, 40, 1, 20, 30). (AxBxC tensor is mapped to 1xCx1xAxB tensor) 60 # More examples: 61 # dimshuffle('x') -> make a 0d (scalar) into a 1d vector 62 # dimshuffle(0, 1) -> identity 63 # dimshuffle(1, 0) -> inverts the first and second dimensions 64 # dimshuffle('x', 0) -> make a row out of a 1d vector (N to 1xN) 65 # dimshuffle(0, 'x') -> make a column out of a 1d vector (N to Nx1) 66 # dimshuffle(2, 0, 1) -> AxBxC to CxAxB 67 # dimshuffle(0, 'x', 1) -> AxB to Ax1xB 68 # dimshuffle(1, 'x', 0) -> AxB to Bx1xA 69 70 # 卷积后的结果加上偏置,然后进行一个非线性函数计算,这里采用的是sigmoid函数 71 output = T.nnet.sigmoid(conv_out + b.dimshuffle('x', 0, 'x', 'x')) 72 73 # create theano function to compute filtered images 74 f = theano.function([input], output) 75 76 77 78 # open random image of dimensions 639x516 79 img_file = open('E:\Python\3wolfmoon.jpg','rb') 80 img = Image.open(img_file) 81 # dimensions are (height, width, channel) 82 img = numpy.asarray(img, dtype='float64') / 256. 83 84 # put image in 4D tensor of shape (1, 3, height, width) 85 cc = img.transpose(2, 0, 1) 86 img_ = img.transpose(2, 0, 1).reshape(1, 3, 639, 516) 87 filtered_img = f(img_) 88 89 # plot original image and first and second components of output 90 pylab.subplot(1, 3, 1); pylab.axis('off'); pylab.imshow(img) 91 pylab.gray(); 92 # recall that the convOp output (filtered image) is actually a "minibatch", 93 # of size 1 here, so we take index 0 in the first dimension: 94 pylab.subplot(1, 3, 2); pylab.axis('off'); pylab.imshow(filtered_img[0, 0, :, :]) 95 pylab.subplot(1, 3, 3); pylab.axis('off'); pylab.imshow(filtered_img[0, 1, :, :]) 96 pylab.show()
注意到,随机初始化的滤波器非常像一个边缘检测器。
6.最大池化(MaxPooling)
最大池化是一种下采样的形式,最大池化额操作就是把图像分割成不重叠的矩形区域,每一个子区域选出一个最大值。
最大池化的两个作用:
- 去除了非最大值,减少了后面一层的计算量
- (这里还没怎么看懂,后面是原讲义的说法)It provides a form of translation invariance. Imagine cascading a max-pooling layer with a convolutional layer. There are 8 directions in which one can translate the input image by a single pixel. If max-pooling is done over a 2x2 region, 3 out of these 8 possible configurations will produce exactly the same output at the convolutional layer. For max-pooling over a 3x3 window, this jumps to 5/8.Since it provides additional robustness to position, max-pooling is a “smart” way of reducing the dimensionality of intermediate representations.
最大池化在theano中是通过theano.tensor.signal.downsample.max_pool_2d实现的,例如:
1 # -*- coding: utf-8 -*- 2 """ 3 Created on Tue Apr 28 15:17:23 2015 4 5 @author: ZengJiulin 6 """ 7 import theano 8 from theano import tensor as T 9 import numpy 10 from theano.tensor.signal import downsample 11 12 input = T.dtensor4('input') 13 maxpool_shape = (2, 2) 14 pool_out = downsample.max_pool_2d(input, maxpool_shape, ignore_border=True) 15 f = theano.function([input],pool_out) 16 17 invals = numpy.random.RandomState(1).rand(3, 2, 5, 5) 18 print 'With ignore_border set to True:' 19 print 'invals[0, 0, :, :] = ', invals[0, 0, :, :] 20 print 'output[0, 0, :, :] = ', f(invals)[0, 0, :, :] 21 22 pool_out = downsample.max_pool_2d(input, maxpool_shape, ignore_border=False) 23 f = theano.function([input],pool_out) 24 print 'With ignore_border set to False:' 25 print 'invals[1, 0, :, :] = ', invals[1, 0, :, :] 26 print 'output[1, 0, :, :] = ', f(invals)[1, 0, :, :]
注意忽略边界和不忽略边界的区别:
>>> runfile('E:/Python/downsample.py', wdir=r'E:/Python') Using gpu device 0: GeForce GT 720M With ignore_border set to True: invals[0, 0, :, :] = [[ 4.17022005e-01 7.20324493e-01 1.14374817e-04 3.02332573e-01 1.46755891e-01] [ 9.23385948e-02 1.86260211e-01 3.45560727e-01 3.96767474e-01 5.38816734e-01] [ 4.19194514e-01 6.85219500e-01 2.04452250e-01 8.78117436e-01 2.73875932e-02] [ 6.70467510e-01 4.17304802e-01 5.58689828e-01 1.40386939e-01 1.98101489e-01] [ 8.00744569e-01 9.68261576e-01 3.13424178e-01 6.92322616e-01 8.76389152e-01]] output[0, 0, :, :] = [[ 0.72032449 0.39676747] [ 0.6852195 0.87811744]] With ignore_border set to False: invals[1, 0, :, :] = [[ 0.01936696 0.67883553 0.21162812 0.26554666 0.49157316] [ 0.05336255 0.57411761 0.14672857 0.58930554 0.69975836] [ 0.10233443 0.41405599 0.69440016 0.41417927 0.04995346] [ 0.53589641 0.66379465 0.51488911 0.94459476 0.58655504] [ 0.90340192 0.1374747 0.13927635 0.80739129 0.39767684]] output[1, 0, :, :] = [[ 0.67883553 0.58930554 0.69975836] [ 0.66379465 0.94459476 0.58655504] [ 0.90340192 0.80739129 0.39767684]] >>>
7.LeNet整个模型
稀疏,卷积层和最大池化是 LeNet 模型的核心,但是具体的其他细节可能变化很大。下图给出LeNet的一个描述:
底层由卷积层和下采样层交替,顶层与传统的 MLP 全连接。
从整个执行过程看,就是把一个4阶的张量整理成MLP能够处理的2维特征图。
8.全部代码
1 # -*- coding: utf-8 -*- 2 """ 3 Created on Sat Apr 25 14:20:02 2015 4 5 @author: ZengJiulin 6 """ 7 8 """This tutorial introduces the LeNet5 neural network architecture 9 using Theano. LeNet5 is a convolutional neural network, good for 10 classifying images. This tutorial shows how to build the architecture, 11 and comes with all the hyper-parameters you need to reproduce the 12 paper's MNIST results. 13 14 15 This implementation simplifies the model in the following ways: 16 17 - LeNetConvPool doesn't implement location-specific gain and bias parameters 18 - LeNetConvPool doesn't implement pooling by average, it implements pooling 19 by max. 20 - Digit classification is implemented with a logistic regression rather than 21 an RBF network 22 - LeNet5 was not fully-connected convolutions at second layer 23 24 References: 25 - Y. LeCun, L. Bottou, Y. Bengio and P. Haffner: 26 Gradient-Based Learning Applied to Document 27 Recognition, Proceedings of the IEEE, 86(11):2278-2324, November 1998. 28 http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf 29 30 """ 31 import os 32 import sys 33 import time 34 35 import numpy 36 37 import theano 38 import theano.tensor as T 39 from theano.tensor.signal import downsample 40 from theano.tensor.nnet import conv 41 42 from logistic_sgd import LogisticRegression, load_data 43 from mlp import HiddenLayer 44 45 46 class LeNetConvPoolLayer(object): 47 """Pool Layer of a convolutional network """ 48 49 def __init__(self, rng, input, filter_shape, image_shape, poolsize=(2, 2)): 50 """ 51 Allocate a LeNetConvPoolLayer with shared variable internal parameters. 52 53 :type rng: numpy.random.RandomState 54 :param rng: a random number generator used to initialize weights 55 56 :type input: theano.tensor.dtensor4 57 :param input: symbolic image tensor, of shape image_shape 58 59 :type filter_shape: tuple or list of length 4 60 :param filter_shape: (number of filters, num input feature maps, 61 filter height, filter width) 62 63 :type image_shape: tuple or list of length 4 64 :param image_shape: (batch size, num input feature maps, 65 image height, image width) 66 67 :type poolsize: tuple or list of length 2 68 :param poolsize: the downsampling (pooling) factor (#rows, #cols) 69 """ 70 71 assert image_shape[1] == filter_shape[1] 72 self.input = input 73 74 # there are "num input feature maps * filter height * filter width" 75 # inputs to each hidden unit 76 77 fan_in = numpy.prod(filter_shape[1:]) 78 # each unit in the lower layer receives a gradient from: 79 # "num output feature maps * filter height * filter width" / 80 # pooling size 81 fan_out = (filter_shape[0] * numpy.prod(filter_shape[2:]) / 82 numpy.prod(poolsize)) 83 # initialize weights with random weights 84 W_bound = numpy.sqrt(6. / (fan_in + fan_out)) 85 #卷积核本质上就是下面这个权重矩阵 86 self.W = theano.shared( 87 numpy.asarray( 88 rng.uniform(low=-W_bound, high=W_bound, size=filter_shape), 89 dtype=theano.config.floatX 90 ), 91 borrow=True 92 ) 93 94 # the bias is a 1D tensor -- one bias per output feature map 95 b_values = numpy.zeros((filter_shape[0],), dtype=theano.config.floatX) 96 self.b = theano.shared(value=b_values, borrow=True) 97 98 # convolve input feature maps with filters 99 conv_out = conv.conv2d( 100 input=input, 101 filters=self.W, 102 filter_shape=filter_shape, 103 image_shape=image_shape 104 ) 105 106 # downsample each feature map individually, using maxpooling 107 pooled_out = downsample.max_pool_2d( 108 input=conv_out, 109 ds=poolsize, 110 ignore_border=True 111 ) 112 113 # add the bias term. Since the bias is a vector (1D array), we first 114 # reshape it to a tensor of shape (1, n_filters, 1, 1). Each bias will 115 # thus be broadcasted across mini-batches and feature map 116 # width & height 117 self.output = T.tanh(pooled_out + self.b.dimshuffle('x', 0, 'x', 'x')) 118 119 # store parameters of this layer 120 self.params = [self.W, self.b] 121 122 123 def evaluate_lenet5(learning_rate=0.1, n_epochs=200, 124 dataset='mnist.pkl.gz', 125 nkerns=[20, 50], batch_size=500): 126 """ Demonstrates lenet on MNIST dataset 127 128 :type learning_rate: float 129 :param learning_rate: learning rate used (factor for the stochastic 130 gradient) 131 132 :type n_epochs: int 133 :param n_epochs: maximal number of epochs to run the optimizer 134 135 :type dataset: string 136 :param dataset: path to the dataset used for training /testing (MNIST here) 137 138 :type nkerns: list of ints 139 :param nkerns: number of kernels on each layer(两层,第一层20个卷积核, 140 第二层50个卷积核) 141 """ 142 143 rng = numpy.random.RandomState(23455) 144 145 datasets = load_data(dataset) 146 147 train_set_x, train_set_y = datasets[0] 148 valid_set_x, valid_set_y = datasets[1] 149 test_set_x, test_set_y = datasets[2] 150 151 # compute number of minibatches for training, validation and testing 152 n_train_batches = train_set_x.get_value(borrow=True).shape[0] 153 n_valid_batches = valid_set_x.get_value(borrow=True).shape[0] 154 n_test_batches = test_set_x.get_value(borrow=True).shape[0] 155 n_train_batches /= batch_size 156 n_valid_batches /= batch_size 157 n_test_batches /= batch_size 158 159 # allocate symbolic variables for the data 160 index = T.lscalar() # index to a [mini]batch 161 162 # start-snippet-1 163 x = T.matrix('x') # the data is presented as rasterized images 164 y = T.ivector('y') # the labels are presented as 1D vector of 165 # [int] labels 166 167 ###################### 168 # BUILD ACTUAL MODEL # 169 ###################### 170 print '... building the model' 171 172 # Reshape matrix of rasterized images of shape (batch_size, 28 * 28) 173 # to a 4D tensor, compatible with our LeNetConvPoolLayer 174 # (28, 28) is the size of MNIST images. 175 # 输入一幅图像 176 layer0_input = x.reshape((batch_size, 1, 28, 28)) 177 178 # Construct the first convolutional pooling layer: 179 # filtering reduces the image size to (28-5+1 , 28-5+1) = (24, 24) 180 # maxpooling reduces this further to (24/2, 24/2) = (12, 12) 181 # 4D output tensor is thus of shape (batch_size, nkerns[0], 12, 12) 182 layer0 = LeNetConvPoolLayer( 183 rng, 184 input=layer0_input, 185 image_shape=(batch_size, 1, 28, 28), 186 filter_shape=(nkerns[0], 1, 5, 5), 187 poolsize=(2, 2) 188 ) 189 190 # Construct the second convolutional pooling layer 191 # filtering reduces the image size to (12-5+1, 12-5+1) = (8, 8) 192 # maxpooling reduces this further to (8/2, 8/2) = (4, 4) 193 # 4D output tensor is thus of shape (batch_size, nkerns[1], 4, 4) 194 # 由于第0层有nkerns[0]个卷积核,所以输出了nkerns[0]幅特征图 195 # 第1层的输入就是第0层的输出 196 layer1 = LeNetConvPoolLayer( 197 rng, 198 input=layer0.output, 199 image_shape=(batch_size, nkerns[0], 12, 12), 200 filter_shape=(nkerns[1], nkerns[0], 5, 5), 201 poolsize=(2, 2) 202 ) 203 204 # the HiddenLayer being fully-connected, it operates on 2D matrices of 205 # shape (batch_size, num_pixels) (i.e matrix of rasterized images). 206 # This will generate a matrix of shape (batch_size, nkerns[1] * 4 * 4), 207 # or (500, 50 * 4 * 4) = (500, 800) with the default values. 208 layer2_input = layer1.output.flatten(2) 209 210 # construct a fully-connected sigmoidal layer 211 layer2 = HiddenLayer( 212 rng, 213 input=layer2_input, 214 n_in=nkerns[1] * 4 * 4, 215 n_out=500, 216 activation=T.tanh 217 ) 218 219 # classify the values of the fully-connected sigmoidal layer 220 layer3 = LogisticRegression(input=layer2.output, n_in=500, n_out=10) 221 222 # the cost we minimize during training is the NLL of the model 223 cost = layer3.negative_log_likelihood(y) 224 225 # create a function to compute the mistakes that are made by the model 226 test_model = theano.function( 227 [index], 228 layer3.errors(y), 229 givens={ 230 x: test_set_x[index * batch_size: (index + 1) * batch_size], 231 y: test_set_y[index * batch_size: (index + 1) * batch_size] 232 } 233 ) 234 235 validate_model = theano.function( 236 [index], 237 layer3.errors(y), 238 givens={ 239 x: valid_set_x[index * batch_size: (index + 1) * batch_size], 240 y: valid_set_y[index * batch_size: (index + 1) * batch_size] 241 } 242 ) 243 244 # create a list of all model parameters to be fit by gradient descent 245 params = layer3.params + layer2.params + layer1.params + layer0.params 246 247 # create a list of gradients for all model parameters 248 grads = T.grad(cost, params) 249 250 # train_model is a function that updates the model parameters by 251 # SGD Since this model has many parameters, it would be tedious to 252 # manually create an update rule for each model parameter. We thus 253 # create the updates list by automatically looping over all 254 # (params[i], grads[i]) pairs. 255 updates = [ 256 (param_i, param_i - learning_rate * grad_i) 257 for param_i, grad_i in zip(params, grads) 258 ] 259 260 train_model = theano.function( 261 [index], 262 cost, 263 updates=updates, 264 givens={ 265 x: train_set_x[index * batch_size: (index + 1) * batch_size], 266 y: train_set_y[index * batch_size: (index + 1) * batch_size] 267 } 268 ) 269 # end-snippet-1 270 271 ############### 272 # TRAIN MODEL # 273 ############### 274 print '... training' 275 # early-stopping parameters 276 patience = 10000 # look as this many examples regardless 277 patience_increase = 2 # wait this much longer when a new best is 278 # found 279 improvement_threshold = 0.995 # a relative improvement of this much is 280 # considered significant 281 validation_frequency = min(n_train_batches, patience / 2) 282 # go through this many 283 # minibatche before checking the network 284 # on the validation set; in this case we 285 # check every epoch 286 287 best_validation_loss = numpy.inf 288 best_iter = 0 289 test_score = 0. 290 start_time = time.clock() 291 292 epoch = 0 293 done_looping = False 294 295 while (epoch < n_epochs) and (not done_looping): 296 epoch = epoch + 1 297 for minibatch_index in xrange(n_train_batches): 298 299 iter = (epoch - 1) * n_train_batches + minibatch_index 300 301 if iter % 100 == 0: 302 print 'training @ iter = ', iter 303 cost_ij = train_model(minibatch_index) 304 305 if (iter + 1) % validation_frequency == 0: 306 307 # compute zero-one loss on validation set 308 validation_losses = [validate_model(i) for i 309 in xrange(n_valid_batches)] 310 this_validation_loss = numpy.mean(validation_losses) 311 print('epoch %i, minibatch %i/%i, validation error %f %%' % 312 (epoch, minibatch_index + 1, n_train_batches, 313 this_validation_loss * 100.)) 314 315 # if we got the best validation score until now 316 if this_validation_loss < best_validation_loss: 317 318 #improve patience if loss improvement is good enough 319 if this_validation_loss < best_validation_loss * 320 improvement_threshold: 321 patience = max(patience, iter * patience_increase) 322 323 # save best validation score and iteration number 324 best_validation_loss = this_validation_loss 325 best_iter = iter 326 327 # test it on the test set 328 test_losses = [ 329 test_model(i) 330 for i in xrange(n_test_batches) 331 ] 332 test_score = numpy.mean(test_losses) 333 print((' epoch %i, minibatch %i/%i, test error of ' 334 'best model %f %%') % 335 (epoch, minibatch_index + 1, n_train_batches, 336 test_score * 100.)) 337 338 if patience <= iter: 339 done_looping = True 340 break 341 342 end_time = time.clock() 343 print('Optimization complete.') 344 print('Best validation score of %f %% obtained at iteration %i, ' 345 'with test performance %f %%' % 346 (best_validation_loss * 100., best_iter + 1, test_score * 100.)) 347 print >> sys.stderr, ('The code for file ' + 348 os.path.split(__file__)[1] + 349 ' ran for %.2fm' % ((end_time - start_time) / 60.)) 350 351 if __name__ == '__main__': 352 evaluate_lenet5() 353 354 355 def experiment(state, channel): 356 evaluate_lenet5(state.learning_rate, dataset=state.dataset)
在GeForce GT 720M GPU上运行170多分钟
9.训练技巧
- 滤波器的数量:计算一个卷积滤波器要比训练传统的MLPs花费更多的时间!由于特征图的尺寸随着深度不断减小,所以在靠近输出层的时候,滤波器(卷积核)的数量通常比较少。为了保留输入层的信息,激活单元的数量在层数增加的时候要保证不能减少。
- 滤波器尺寸:滤波器尺寸通常依赖于数据集。在Minist数据集上最好的尺寸是5*5,通常的自然图像较好的是12*12或者15*15
- 池化尺寸:典型的值就是2*2,对于很大的输入,可以在较低的层上使用4*4,但是记住,这将会使得信号的维度降低为原来的1/16,可能会损失太多的信息