zoukankan      html  css  js  c++  java
  • 基于theano的深度卷积神经网络

    1.引言

    卷积神经网络(Convolutional Neural Networks , CNN)受到视网膜上的细胞只对视野范围内的部分区域敏感,这一部分区域称为感受域(receptive field).卷积神经网络正是采用了这种机制,每一个神经元只与一部分输入相连接。

    2.稀疏连接

    CNNs通过局部连接的方式揭示了空间中的局部相关性。在 $m$ 层的隐单元的输入来自于 $m-1$ 层的一部分单元的加权和,这一部分单元在空间上是连续的感受域。如下图:

    可以把 $m-1$ 层想象成视网膜输入。$m$ 层的单元的感受域的宽度均为3,因此只与视网膜层的 3 个相邻的神经元相连接。$m+1$ 层的单元与其下面一层的连接方式也是如此。每一个神经元对不在感受域范围内的变化是没有反应的,所以上面的结构保证学习出一种“滤波器“,使其对局部空间的输入模式产生强烈的反应。

    但是,正如上面图中所示,把许多这样的滤波器层层级联,局部感知逐渐变得全局感知,$m$ 层的每一个单元只对部分输入感知,而 $m+1$ 层的单元又将 $m$ 层的感知结果综合起来从而形成对输入层全部的一个感知,所以$m+1$隐层单元可以看作是对宽度为5的特征的一个非线性编码。

    3.共享权重(Shared Weights

    在CNNs,每个滤波器 $h_{i}$ 重复地逐步横跨整个输入层。重复的单元共享参数(权重向量和偏置),从而形成一幅特征图。

    在上图中,3个隐层单元属于同一幅特征图,一样颜色的权重值是共享的,即相等的。

    滤波器通过这种方式使得图像中可视层中任意位置的特征都能被检测出来,权重共享大大减少了需要学习的参数的数量。

    4.细节和符号

    通过重复地把一个函数运用到整个图像的子区域可以得到一幅特征图,即用一个线性滤波器对图像进行卷积操作,加上偏置项,然后再采用一个非线性函数。如果用 $h^{k}$ 表示第 $k$ 幅特征图,其对应的滤波器由 $W^{k}$ 和偏置 $b_{k}$ 决定, 那么特征图 $h^{k}$ 可以由下计算得到(采用 tanh 作为非线性函数):

    $h_{ij}^{k}=tanh((W^{k}*x)_{ij}+b_{k}$

    为了得到对数据更加丰富的表示,通常每个隐层都由多幅特征图组成:${h^{ ext{(k)}},k=0,...K}$.权重 $W$ 由一个4维的张量表示, 4各维度分别表示:目的特征图,源特征图,源特征图的垂直坐标,源特征图的水平坐标。偏置 $b$ 由一个向量表示,其中每一个元素是每一个目标特征图对应的偏置。可以表示如下:

    在上图中 $W_{ij}^{kl}$ 表示在 $m-1$ 层的第 $k$ 幅特征图的每一个像素 与第 $m$ 层的第 $l$ 幅特征图的像素 $(i,j)$ 之间的连接权重。

    5.卷积操作

    卷积操作(Convolution operation,ConvOp)在theano中是通过theano.tensor.signal.conv2d实现的,它需要两个输入:

    • 输入图像的部分子集对应的一个4阶张量,该张量的每一维分别表示:子集的大小,输入特征图的编号,图像的高度,图像的宽度
    • 表示权重矩阵 $W$ 的一个4阶张量,每一维分别表示:在 $m$ 的特征图像的编号,$m-1$ 层特征图像的编号,滤波器的高度,滤波器的宽度

    这里还要介绍一个在下面代码中将要用到的一个函数 dimshuffle(*pattern):

      例如dimshuffle('x', 2, 'x', 0, 1),就是将原来3阶张量扩展为5阶张量,新张量的第0维和第2维为0,而第1维,第3维和第4维分别由原来3阶张量的第2维,第0维和第1维映射而来。

      如果原来张量的形状为(20,30,40),通过dimshuffle('x', 2, 'x', 0, 1)之后,形状变为(1,40,1,20,30)

      dimshuffle(0, 1) -> 和原来一样

      dimshuffle(1, 0) -> 交换第1维和第0维的数据

      更多详细资料参看:dimshuffle

    下面用到的图片3wolfmoon

    下面对输入是3 幅RGB 特征图,进行卷积操作,并输出卷积前后的对比图:

     1 # -*- coding: utf-8 -*-
     2 """
     3 Created on Tue Apr 28 10:22:14 2015
     4 
     5 @author: ZengJiulin
     6 """
     7 
     8 import theano
     9 from theano import tensor as T
    10 from theano.tensor.nnet import conv
    11 import pylab
    12 from PIL import Image
    13 import numpy
    14 
    15 rng = numpy.random.RandomState(23455)
    16 
    17 # instantiate 4D tensor for input
    18 input = T.tensor4(name='input',dtype='float64')
    19 
    20 # initialize shared variable for weights.
    21 # 输出的特征图 2 幅
    22 # 输入的特征图 3 幅
    23 # 滤波器的大小 9*9
    24 w_shp = (2, 3, 9, 9)
    25 w_bound = numpy.sqrt(3 * 9 * 9)
    26 W = theano.shared( numpy.asarray(
    27             rng.uniform(
    28                 low=-1.0 / w_bound,
    29                 high=1.0 / w_bound,
    30                 size=w_shp),
    31             dtype=input.dtype), name ='W')
    32 
    33 # initialize shared variable for bias (1D tensor) with random values
    34 # IMPORTANT: biases are usually initialized to zero. However in this
    35 # particular application, we simply apply the convolutional layer to
    36 # an image without learning the parameters. We therefore initialize
    37 # them to random values to "simulate" learning.
    38 # 输出的特征图有 2 幅,所以偏置向量的元素个数同样为 2
    39 b_shp = (2,)
    40 b = theano.shared(numpy.asarray(
    41             rng.uniform(low=-.5, high=.5, size=b_shp),
    42             dtype=input.dtype), name ='b')
    43 
    44 # build symbolic expression that computes the convolution of input with filters in w
    45 conv_out = conv.conv2d(input, W)
    46 
    47 # build symbolic expression to add bias and apply activation function, i.e. produce neural net layer output
    48 # A few words on ``dimshuffle`` :
    49 #   ``dimshuffle`` is a powerful tool in reshaping a tensor;
    50 #   what it allows you to do is to shuffle dimension around
    51 #   but also to insert new ones along which the tensor will be
    52 #   broadcastable;
    53 #   dimshuffle('x', 2, 'x', 0, 1)
    54 #   This will work on 3d tensors with no broadcastable
    55 #   dimensions. The first dimension will be broadcastable,
    56 #   then we will have the third dimension of the input tensor as
    57 #   the second of the resulting tensor, etc. If the tensor has
    58 #   shape (20, 30, 40), the resulting tensor will have dimensions
    59 #   (1, 40, 1, 20, 30). (AxBxC tensor is mapped to 1xCx1xAxB tensor)
    60 #   More examples:
    61 #    dimshuffle('x') -> make a 0d (scalar) into a 1d vector
    62 #    dimshuffle(0, 1) -> identity
    63 #    dimshuffle(1, 0) -> inverts the first and second dimensions
    64 #    dimshuffle('x', 0) -> make a row out of a 1d vector (N to 1xN)
    65 #    dimshuffle(0, 'x') -> make a column out of a 1d vector (N to Nx1)
    66 #    dimshuffle(2, 0, 1) -> AxBxC to CxAxB
    67 #    dimshuffle(0, 'x', 1) -> AxB to Ax1xB
    68 #    dimshuffle(1, 'x', 0) -> AxB to Bx1xA
    69 
    70 # 卷积后的结果加上偏置,然后进行一个非线性函数计算,这里采用的是sigmoid函数
    71 output = T.nnet.sigmoid(conv_out + b.dimshuffle('x', 0, 'x', 'x'))
    72 
    73 # create theano function to compute filtered images
    74 f = theano.function([input], output)
    75 
    76 
    77 
    78 # open random image of dimensions 639x516
    79 img_file = open('E:\Python\3wolfmoon.jpg','rb')
    80 img = Image.open(img_file)
    81 # dimensions are (height, width, channel)
    82 img = numpy.asarray(img, dtype='float64') / 256.
    83 
    84 # put image in 4D tensor of shape (1, 3, height, width)
    85 cc = img.transpose(2, 0, 1)
    86 img_ = img.transpose(2, 0, 1).reshape(1, 3, 639, 516)
    87 filtered_img = f(img_)
    88 
    89 # plot original image and first and second components of output
    90 pylab.subplot(1, 3, 1); pylab.axis('off'); pylab.imshow(img)
    91 pylab.gray();
    92 # recall that the convOp output (filtered image) is actually a "minibatch",
    93 # of size 1 here, so we take index 0 in the first dimension:
    94 pylab.subplot(1, 3, 2); pylab.axis('off'); pylab.imshow(filtered_img[0, 0, :, :])
    95 pylab.subplot(1, 3, 3); pylab.axis('off'); pylab.imshow(filtered_img[0, 1, :, :])
    96 pylab.show()

    注意到,随机初始化的滤波器非常像一个边缘检测器。

    6.最大池化(MaxPooling)

    最大池化是一种下采样的形式,最大池化额操作就是把图像分割成不重叠的矩形区域,每一个子区域选出一个最大值。

    最大池化的两个作用:

    • 去除了非最大值,减少了后面一层的计算量
    • (这里还没怎么看懂,后面是原讲义的说法)It provides a form of translation invariance. Imagine cascading a max-pooling layer with a convolutional layer. There are 8 directions in which one can translate the input image by a single pixel. If max-pooling is done over a 2x2 region, 3 out of these 8 possible configurations will produce exactly the same output at the convolutional layer. For max-pooling over a 3x3 window, this jumps to 5/8.Since it provides additional robustness to position, max-pooling is a “smart” way of reducing the dimensionality of intermediate representations.

    最大池化在theano中是通过theano.tensor.signal.downsample.max_pool_2d实现的,例如:

     1 # -*- coding: utf-8 -*-
     2 """
     3 Created on Tue Apr 28 15:17:23 2015
     4 
     5 @author: ZengJiulin
     6 """
     7 import theano
     8 from theano import tensor as T
     9 import numpy
    10 from theano.tensor.signal import downsample
    11 
    12 input = T.dtensor4('input')
    13 maxpool_shape = (2, 2)
    14 pool_out = downsample.max_pool_2d(input, maxpool_shape, ignore_border=True)
    15 f = theano.function([input],pool_out)
    16 
    17 invals = numpy.random.RandomState(1).rand(3, 2, 5, 5)
    18 print 'With ignore_border set to True:'
    19 print 'invals[0, 0, :, :] =
    ', invals[0, 0, :, :]
    20 print 'output[0, 0, :, :] =
    ', f(invals)[0, 0, :, :]
    21 
    22 pool_out = downsample.max_pool_2d(input, maxpool_shape, ignore_border=False)
    23 f = theano.function([input],pool_out)
    24 print 'With ignore_border set to False:'
    25 print 'invals[1, 0, :, :] =
     ', invals[1, 0, :, :]
    26 print 'output[1, 0, :, :] =
     ', f(invals)[1, 0, :, :]

    注意忽略边界和不忽略边界的区别:

    >>> runfile('E:/Python/downsample.py', wdir=r'E:/Python')
    Using gpu device 0: GeForce GT 720M
    With ignore_border set to True:
    invals[0, 0, :, :] =
    [[  4.17022005e-01   7.20324493e-01   1.14374817e-04   3.02332573e-01
        1.46755891e-01]
     [  9.23385948e-02   1.86260211e-01   3.45560727e-01   3.96767474e-01
        5.38816734e-01]
     [  4.19194514e-01   6.85219500e-01   2.04452250e-01   8.78117436e-01
        2.73875932e-02]
     [  6.70467510e-01   4.17304802e-01   5.58689828e-01   1.40386939e-01
        1.98101489e-01]
     [  8.00744569e-01   9.68261576e-01   3.13424178e-01   6.92322616e-01
        8.76389152e-01]]
    output[0, 0, :, :] =
    [[ 0.72032449  0.39676747]
     [ 0.6852195   0.87811744]]
    With ignore_border set to False:
    invals[1, 0, :, :] =
      [[ 0.01936696  0.67883553  0.21162812  0.26554666  0.49157316]
     [ 0.05336255  0.57411761  0.14672857  0.58930554  0.69975836]
     [ 0.10233443  0.41405599  0.69440016  0.41417927  0.04995346]
     [ 0.53589641  0.66379465  0.51488911  0.94459476  0.58655504]
     [ 0.90340192  0.1374747   0.13927635  0.80739129  0.39767684]]
    output[1, 0, :, :] =
      [[ 0.67883553  0.58930554  0.69975836]
     [ 0.66379465  0.94459476  0.58655504]
     [ 0.90340192  0.80739129  0.39767684]]
    >>> 

    7.LeNet整个模型

    稀疏,卷积层和最大池化是 LeNet 模型的核心,但是具体的其他细节可能变化很大。下图给出LeNet的一个描述:

    底层由卷积层和下采样层交替,顶层与传统的 MLP 全连接。

    从整个执行过程看,就是把一个4阶的张量整理成MLP能够处理的2维特征图。

    8.全部代码

      1 # -*- coding: utf-8 -*-
      2 """
      3 Created on Sat Apr 25 14:20:02 2015
      4 
      5 @author: ZengJiulin
      6 """
      7 
      8 """This tutorial introduces the LeNet5 neural network architecture
      9 using Theano.  LeNet5 is a convolutional neural network, good for
     10 classifying images. This tutorial shows how to build the architecture,
     11 and comes with all the hyper-parameters you need to reproduce the
     12 paper's MNIST results.
     13 
     14 
     15 This implementation simplifies the model in the following ways:
     16 
     17  - LeNetConvPool doesn't implement location-specific gain and bias parameters
     18  - LeNetConvPool doesn't implement pooling by average, it implements pooling
     19    by max.
     20  - Digit classification is implemented with a logistic regression rather than
     21    an RBF network
     22  - LeNet5 was not fully-connected convolutions at second layer
     23 
     24 References:
     25  - Y. LeCun, L. Bottou, Y. Bengio and P. Haffner:
     26    Gradient-Based Learning Applied to Document
     27    Recognition, Proceedings of the IEEE, 86(11):2278-2324, November 1998.
     28    http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf
     29 
     30 """
     31 import os
     32 import sys
     33 import time
     34 
     35 import numpy
     36 
     37 import theano
     38 import theano.tensor as T
     39 from theano.tensor.signal import downsample
     40 from theano.tensor.nnet import conv
     41 
     42 from logistic_sgd import LogisticRegression, load_data
     43 from mlp import HiddenLayer
     44 
     45 
     46 class LeNetConvPoolLayer(object):
     47     """Pool Layer of a convolutional network """
     48 
     49     def __init__(self, rng, input, filter_shape, image_shape, poolsize=(2, 2)):
     50         """
     51         Allocate a LeNetConvPoolLayer with shared variable internal parameters.
     52 
     53         :type rng: numpy.random.RandomState
     54         :param rng: a random number generator used to initialize weights
     55 
     56         :type input: theano.tensor.dtensor4
     57         :param input: symbolic image tensor, of shape image_shape
     58 
     59         :type filter_shape: tuple or list of length 4
     60         :param filter_shape: (number of filters, num input feature maps,
     61                               filter height, filter width)
     62 
     63         :type image_shape: tuple or list of length 4
     64         :param image_shape: (batch size, num input feature maps,
     65                              image height, image width)
     66 
     67         :type poolsize: tuple or list of length 2
     68         :param poolsize: the downsampling (pooling) factor (#rows, #cols)
     69         """
     70 
     71         assert image_shape[1] == filter_shape[1]
     72         self.input = input
     73 
     74         # there are "num input feature maps * filter height * filter width"
     75         # inputs to each hidden unit
     76         
     77         fan_in = numpy.prod(filter_shape[1:])
     78         # each unit in the lower layer receives a gradient from:
     79         # "num output feature maps * filter height * filter width" /
     80         #   pooling size
     81         fan_out = (filter_shape[0] * numpy.prod(filter_shape[2:]) /
     82                    numpy.prod(poolsize))
     83         # initialize weights with random weights
     84         W_bound = numpy.sqrt(6. / (fan_in + fan_out))
     85         #卷积核本质上就是下面这个权重矩阵
     86         self.W = theano.shared(
     87             numpy.asarray(
     88                 rng.uniform(low=-W_bound, high=W_bound, size=filter_shape),
     89                 dtype=theano.config.floatX
     90             ),
     91             borrow=True
     92         )
     93 
     94         # the bias is a 1D tensor -- one bias per output feature map
     95         b_values = numpy.zeros((filter_shape[0],), dtype=theano.config.floatX)
     96         self.b = theano.shared(value=b_values, borrow=True)
     97 
     98         # convolve input feature maps with filters
     99         conv_out = conv.conv2d(
    100             input=input,
    101             filters=self.W,
    102             filter_shape=filter_shape,
    103             image_shape=image_shape
    104         )
    105 
    106         # downsample each feature map individually, using maxpooling
    107         pooled_out = downsample.max_pool_2d(
    108             input=conv_out,
    109             ds=poolsize,
    110             ignore_border=True
    111         )
    112 
    113         # add the bias term. Since the bias is a vector (1D array), we first
    114         # reshape it to a tensor of shape (1, n_filters, 1, 1). Each bias will
    115         # thus be broadcasted across mini-batches and feature map
    116         # width & height
    117         self.output = T.tanh(pooled_out + self.b.dimshuffle('x', 0, 'x', 'x'))
    118 
    119         # store parameters of this layer
    120         self.params = [self.W, self.b]
    121 
    122 
    123 def evaluate_lenet5(learning_rate=0.1, n_epochs=200,
    124                     dataset='mnist.pkl.gz',
    125                     nkerns=[20, 50], batch_size=500):
    126     """ Demonstrates lenet on MNIST dataset
    127 
    128     :type learning_rate: float
    129     :param learning_rate: learning rate used (factor for the stochastic
    130                           gradient)
    131 
    132     :type n_epochs: int
    133     :param n_epochs: maximal number of epochs to run the optimizer
    134 
    135     :type dataset: string
    136     :param dataset: path to the dataset used for training /testing (MNIST here)
    137 
    138     :type nkerns: list of ints
    139     :param nkerns: number of kernels on each layer(两层,第一层20个卷积核,
    140         第二层50个卷积核)
    141     """
    142 
    143     rng = numpy.random.RandomState(23455)
    144 
    145     datasets = load_data(dataset)
    146 
    147     train_set_x, train_set_y = datasets[0]
    148     valid_set_x, valid_set_y = datasets[1]
    149     test_set_x, test_set_y = datasets[2]
    150 
    151     # compute number of minibatches for training, validation and testing
    152     n_train_batches = train_set_x.get_value(borrow=True).shape[0]
    153     n_valid_batches = valid_set_x.get_value(borrow=True).shape[0]
    154     n_test_batches = test_set_x.get_value(borrow=True).shape[0]
    155     n_train_batches /= batch_size
    156     n_valid_batches /= batch_size
    157     n_test_batches /= batch_size
    158 
    159     # allocate symbolic variables for the data
    160     index = T.lscalar()  # index to a [mini]batch
    161 
    162     # start-snippet-1
    163     x = T.matrix('x')   # the data is presented as rasterized images
    164     y = T.ivector('y')  # the labels are presented as 1D vector of
    165                         # [int] labels
    166 
    167     ######################
    168     # BUILD ACTUAL MODEL #
    169     ######################
    170     print '... building the model'
    171 
    172     # Reshape matrix of rasterized images of shape (batch_size, 28 * 28)
    173     # to a 4D tensor, compatible with our LeNetConvPoolLayer
    174     # (28, 28) is the size of MNIST images.
    175     # 输入一幅图像
    176     layer0_input = x.reshape((batch_size, 1, 28, 28))
    177 
    178     # Construct the first convolutional pooling layer:
    179     # filtering reduces the image size to (28-5+1 , 28-5+1) = (24, 24)
    180     # maxpooling reduces this further to (24/2, 24/2) = (12, 12)
    181     # 4D output tensor is thus of shape (batch_size, nkerns[0], 12, 12)
    182     layer0 = LeNetConvPoolLayer(
    183         rng,
    184         input=layer0_input,
    185         image_shape=(batch_size, 1, 28, 28),
    186         filter_shape=(nkerns[0], 1, 5, 5),
    187         poolsize=(2, 2)
    188     )
    189 
    190     # Construct the second convolutional pooling layer
    191     # filtering reduces the image size to (12-5+1, 12-5+1) = (8, 8)
    192     # maxpooling reduces this further to (8/2, 8/2) = (4, 4)
    193     # 4D output tensor is thus of shape (batch_size, nkerns[1], 4, 4)
    194     # 由于第0层有nkerns[0]个卷积核,所以输出了nkerns[0]幅特征图
    195     # 第1层的输入就是第0层的输出
    196     layer1 = LeNetConvPoolLayer(
    197         rng,
    198         input=layer0.output,
    199         image_shape=(batch_size, nkerns[0], 12, 12),
    200         filter_shape=(nkerns[1], nkerns[0], 5, 5),
    201         poolsize=(2, 2)
    202     )
    203 
    204     # the HiddenLayer being fully-connected, it operates on 2D matrices of
    205     # shape (batch_size, num_pixels) (i.e matrix of rasterized images).
    206     # This will generate a matrix of shape (batch_size, nkerns[1] * 4 * 4),
    207     # or (500, 50 * 4 * 4) = (500, 800) with the default values.
    208     layer2_input = layer1.output.flatten(2)
    209 
    210     # construct a fully-connected sigmoidal layer
    211     layer2 = HiddenLayer(
    212         rng,
    213         input=layer2_input,
    214         n_in=nkerns[1] * 4 * 4,
    215         n_out=500,
    216         activation=T.tanh
    217     )
    218 
    219     # classify the values of the fully-connected sigmoidal layer
    220     layer3 = LogisticRegression(input=layer2.output, n_in=500, n_out=10)
    221 
    222     # the cost we minimize during training is the NLL of the model
    223     cost = layer3.negative_log_likelihood(y)
    224 
    225     # create a function to compute the mistakes that are made by the model
    226     test_model = theano.function(
    227         [index],
    228         layer3.errors(y),
    229         givens={
    230             x: test_set_x[index * batch_size: (index + 1) * batch_size],
    231             y: test_set_y[index * batch_size: (index + 1) * batch_size]
    232         }
    233     )
    234 
    235     validate_model = theano.function(
    236         [index],
    237         layer3.errors(y),
    238         givens={
    239             x: valid_set_x[index * batch_size: (index + 1) * batch_size],
    240             y: valid_set_y[index * batch_size: (index + 1) * batch_size]
    241         }
    242     )
    243 
    244     # create a list of all model parameters to be fit by gradient descent
    245     params = layer3.params + layer2.params + layer1.params + layer0.params
    246 
    247     # create a list of gradients for all model parameters
    248     grads = T.grad(cost, params)
    249 
    250     # train_model is a function that updates the model parameters by
    251     # SGD Since this model has many parameters, it would be tedious to
    252     # manually create an update rule for each model parameter. We thus
    253     # create the updates list by automatically looping over all
    254     # (params[i], grads[i]) pairs.
    255     updates = [
    256         (param_i, param_i - learning_rate * grad_i)
    257         for param_i, grad_i in zip(params, grads)
    258     ]
    259 
    260     train_model = theano.function(
    261         [index],
    262         cost,
    263         updates=updates,
    264         givens={
    265             x: train_set_x[index * batch_size: (index + 1) * batch_size],
    266             y: train_set_y[index * batch_size: (index + 1) * batch_size]
    267         }
    268     )
    269     # end-snippet-1
    270 
    271     ###############
    272     # TRAIN MODEL #
    273     ###############
    274     print '... training'
    275     # early-stopping parameters
    276     patience = 10000  # look as this many examples regardless
    277     patience_increase = 2  # wait this much longer when a new best is
    278                            # found
    279     improvement_threshold = 0.995  # a relative improvement of this much is
    280                                    # considered significant
    281     validation_frequency = min(n_train_batches, patience / 2)
    282                                   # go through this many
    283                                   # minibatche before checking the network
    284                                   # on the validation set; in this case we
    285                                   # check every epoch
    286 
    287     best_validation_loss = numpy.inf
    288     best_iter = 0
    289     test_score = 0.
    290     start_time = time.clock()
    291 
    292     epoch = 0
    293     done_looping = False
    294 
    295     while (epoch < n_epochs) and (not done_looping):
    296         epoch = epoch + 1
    297         for minibatch_index in xrange(n_train_batches):
    298 
    299             iter = (epoch - 1) * n_train_batches + minibatch_index
    300 
    301             if iter % 100 == 0:
    302                 print 'training @ iter = ', iter
    303             cost_ij = train_model(minibatch_index)
    304 
    305             if (iter + 1) % validation_frequency == 0:
    306 
    307                 # compute zero-one loss on validation set
    308                 validation_losses = [validate_model(i) for i
    309                                      in xrange(n_valid_batches)]
    310                 this_validation_loss = numpy.mean(validation_losses)
    311                 print('epoch %i, minibatch %i/%i, validation error %f %%' %
    312                       (epoch, minibatch_index + 1, n_train_batches,
    313                        this_validation_loss * 100.))
    314 
    315                 # if we got the best validation score until now
    316                 if this_validation_loss < best_validation_loss:
    317 
    318                     #improve patience if loss improvement is good enough
    319                     if this_validation_loss < best_validation_loss *  
    320                        improvement_threshold:
    321                         patience = max(patience, iter * patience_increase)
    322 
    323                     # save best validation score and iteration number
    324                     best_validation_loss = this_validation_loss
    325                     best_iter = iter
    326 
    327                     # test it on the test set
    328                     test_losses = [
    329                         test_model(i)
    330                         for i in xrange(n_test_batches)
    331                     ]
    332                     test_score = numpy.mean(test_losses)
    333                     print(('     epoch %i, minibatch %i/%i, test error of '
    334                            'best model %f %%') %
    335                           (epoch, minibatch_index + 1, n_train_batches,
    336                            test_score * 100.))
    337 
    338             if patience <= iter:
    339                 done_looping = True
    340                 break
    341 
    342     end_time = time.clock()
    343     print('Optimization complete.')
    344     print('Best validation score of %f %% obtained at iteration %i, '
    345           'with test performance %f %%' %
    346           (best_validation_loss * 100., best_iter + 1, test_score * 100.))
    347     print >> sys.stderr, ('The code for file ' +
    348                           os.path.split(__file__)[1] +
    349                           ' ran for %.2fm' % ((end_time - start_time) / 60.))
    350 
    351 if __name__ == '__main__':
    352     evaluate_lenet5()
    353 
    354 
    355 def experiment(state, channel):
    356     evaluate_lenet5(state.learning_rate, dataset=state.dataset)
    View Code

    在GeForce GT 720M GPU上运行170多分钟

    9.训练技巧

    • 滤波器的数量:计算一个卷积滤波器要比训练传统的MLPs花费更多的时间!由于特征图的尺寸随着深度不断减小,所以在靠近输出层的时候,滤波器(卷积核)的数量通常比较少。为了保留输入层的信息,激活单元的数量在层数增加的时候要保证不能减少。
    • 滤波器尺寸:滤波器尺寸通常依赖于数据集。在Minist数据集上最好的尺寸是5*5,通常的自然图像较好的是12*12或者15*15
    • 池化尺寸:典型的值就是2*2,对于很大的输入,可以在较低的层上使用4*4,但是记住,这将会使得信号的维度降低为原来的1/16,可能会损失太多的信息

    学习资料来源:http://deeplearning.net/tutorial/lenet.html#lenet

  • 相关阅读:
    鼠标滑动带动画下拉展开的滑动门代码
    很靓很大气的简约红色CSS菜单代码
    用Cookie来保存菜单当前位置代码
    单击单选按钮切换对应的菜单代码
    仿微软中国的滑动门导航菜单代码
    C#创建SQLServer的存储过程
    通过应用程序域AppDomain加载和卸载程序集(转载)
    多线程学习笔记一(转载)
    C#实现Treeview节点"正在载入..."效果
    JavaScript 学习笔记之函数理解二
  • 原文地址:https://www.cnblogs.com/90zeng/p/theano_lenet.html
Copyright © 2011-2022 走看看