    CNN是一个神奇的深度学习框架,也是深度学习学科里的一个异类。在被誉为AI寒冬的90年末到2000年初,在大部分学者都弃坑的情况下,CNN的效用却不减反增,感谢Yann LeCun!CNN的架构其实很符合其名,Convolutional Neural Network,CNN在运做的开始运用了卷积(convolution)的概念,外加pooling等方式在多次卷积了图像并形成多个特征图后,输入被平铺开进入一个完全连接的多层神经网络里(fully connected network)里,并由输出的softmax来判断图片的分类情况。该框架的发展史也很有趣,早在90年代末,以LeCun命名的Le-Net5就已经闻名。在深度学习火热后,更多的框架变种也接踵而至,较为闻名的包括多伦多大学的AlexNet,谷歌的GoogLeNet,牛津的OxfordNet外还有Network in Network(NIN),VGG16等多个network。最近,对物体识别的研究开发了RCNN框架,可见在深度学习发展迅猛的今天,CNN框架依然是很多著名研究小组的课题,特别是在了解了Alpha-Go的运作里也可以看到CNN的身影,可见其能力!至于CNN模型的基础构架,这方面的资源甚多,就不一一列举了。


    在运行CIFAR10代码时,你只需要下载该代码,然后cd到代码目录后直接输入python cifar10_train.py就可以了。默认的迭代步骤为100万步,每一步骤需要约3~4秒,运行5小时可以完成近10万步。由于根据cifar10_train.py的描述10万步的准确率为86%左右,我们运行近5个小时左右就可以了,没必要运行全部的100万步。查看结果时,运行python cifar_10_eval.py就可以了。由于模型被存储在了tmp目录里,eval文件可以找寻到最近保存的模型并运行该模型,所以还是很方便的。这个系统在运行后可以从照片里识别10种不同的物体,包括飞机等。这么好玩的系统,快让我们来看一看是怎么实现的吧!


    def train():
      """Train CIFAR-10 for a number of steps."""
      with tf.Graph().as_default():
        global_step = tf.Variable(0, trainable=False)
        # Get images and labels for CIFAR-10.
        # 输入选用的是distored_inputs函数
        images, labels = cifar10.distorted_inputs()
        # Build a Graph that computes the logits predictions from the
        # inference model.
        logits = cifar10.inference(images)
        # Calculate loss.
        loss = cifar10.loss(logits, labels)
        # Build a Graph that trains the model with one batch of examples and
        # updates the model parameters.
        train_op = cifar10.train(loss, global_step)
        # Create a saver.
        saver = tf.train.Saver(tf.all_variables())
        # Build the summary operation based on the TF collection of Summaries.
        summary_op = tf.merge_all_summaries()
        # Build an initialization operation to run below.
        init = tf.initialize_all_variables()
        # Start running operations on the Graph.
        sess = tf.Session(config=tf.ConfigProto(
        # Start the queue runners.
        summary_writer = tf.train.SummaryWriter(FLAGS.train_dir, sess.graph)
        # 在最高的迭代步骤数里进行循环迭代
        for step in xrange(FLAGS.max_steps):
          start_time = time.time()
          _, loss_value = sess.run([train_op, loss])
          duration = time.time() - start_time
          assert not np.isnan(loss_value), 'Model diverged with loss = NaN'
          # 每10个输入数据显示次step,loss,时间等运行数据
          if step % 10 == 0:
            num_examples_per_step = FLAGS.batch_size
            examples_per_sec = num_examples_per_step / duration
            sec_per_batch = float(duration)
            format_str = ('%s: step %d, loss = %.2f (%.1f examples/sec; %.3f '
            print (format_str % (datetime.now(), step, loss_value,
                                 examples_per_sec, sec_per_batch))
          # 每100个输入数据将网络的状况体现在summary里
          if step % 100 == 0:
            summary_str = sess.run(summary_op)
            summary_writer.add_summary(summary_str, step)
          # Save the model checkpoint periodically.
          # 每1000个输入数据保存次模型
          if step % 1000 == 0 or (step + 1) == FLAGS.max_steps:
            checkpoint_path = os.path.join(FLAGS.train_dir, 'model.ckpt')
            saver.save(sess, checkpoint_path, global_step=step)


    def distorted_inputs(data_dir, batch_size):
      """Construct distorted input for CIFAR training using the Reader ops.
        data_dir: Path to the CIFAR-10 data directory.
        batch_size: Number of images per batch.
        images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
        labels: Labels. 1D tensor of [batch_size] size.
      filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)
                   for i in xrange(1, 6)]
      for f in filenames:
        if not tf.gfile.Exists(f):
          raise ValueError('Failed to find file: ' + f)
      # Create a queue that produces the filenames to read.
      filename_queue = tf.train.string_input_producer(filenames)
      # Read examples from files in the filename queue.
      read_input = read_cifar10(filename_queue)
      reshaped_image = tf.cast(read_input.uint8image, tf.float32)
      height = IMAGE_SIZE
      width = IMAGE_SIZE
      # Image processing for training the network. Note the many random
      # distortions applied to the image.
      # Randomly crop a [height, width] section of the image.
      # 步骤1:随机截取一个以[高,宽]为大小的图矩阵。
      distorted_image = tf.random_crop(reshaped_image, [height, width, 3])
      # Randomly flip the image horizontally.
      # 步骤2:随机颠倒图片的左右。概率为50%
      distorted_image = tf.image.random_flip_left_right(distorted_image)
      # Because these operations are not commutative, consider randomizing
      # the order their operation.
      #  步骤3:随机改变图片的亮度以及色彩对比。
      distorted_image = tf.image.random_brightness(distorted_image,
      distorted_image = tf.image.random_contrast(distorted_image,
                                                 lower=0.2, upper=1.8)
      # Subtract off the mean and divide by the variance of the pixels.
      float_image = tf.image.per_image_whitening(distorted_image)
      # Ensure that the random shuffling has good mixing properties.
      min_fraction_of_examples_in_queue = 0.4
      min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *
      print ('Filling queue with %d CIFAR images before starting to train. '
             'This will take a few minutes.' % min_queue_examples)
      # Generate a batch of images and labels by building up a queue of examples.
      return _generate_image_and_label_batch(float_image, read_input.label,
                                             min_queue_examples, batch_size,


    def inputs(eval_data, data_dir, batch_size):
      """Construct input for CIFAR evaluation using the Reader ops.
        eval_data: bool, indicating if one should use the train or eval data set.
        data_dir: Path to the CIFAR-10 data directory.
        batch_size: Number of images per batch.
        images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
        labels: Labels. 1D tensor of [batch_size] size.
      if not eval_data:
        filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)
                     for i in xrange(1, 6)]
        num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN
        filenames = [os.path.join(data_dir, 'test_batch.bin')]
        num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_EVAL
      for f in filenames:
        if not tf.gfile.Exists(f):
          raise ValueError('Failed to find file: ' + f)
      # Create a queue that produces the filenames to read.
      filename_queue = tf.train.string_input_producer(filenames)
      # Read examples from files in the filename queue.
      read_input = read_cifar10(filename_queue)
      reshaped_image = tf.cast(read_input.uint8image, tf.float32)
      height = IMAGE_SIZE
      width = IMAGE_SIZE
      # Image processing for evaluation.
      # Crop the central [height, width] of the image.
    # 截取图片中心区域 resized_image = tf.image.resize_image_with_crop_or_pad(reshaped_image, width, height) # Subtract off the mean and divide by the variance of the pixels.
    # 平衡图片的色差 float_image = tf.image.per_image_whitening(resized_image) # Ensure that the random shuffling has good mixing properties. min_fraction_of_examples_in_queue = 0.4 min_queue_examples = int(num_examples_per_epoch * min_fraction_of_examples_in_queue) # Generate a batch of images and labels by building up a queue of examples. return _generate_image_and_label_batch(float_image, read_input.label, min_queue_examples, batch_size, shuffle=False)



      # The variables below hold all the trainable weights. They are passed an
      # initial value which will be assigned when we call:
      # {tf.initialize_all_variables().run()}
      conv1_weights = tf.Variable(
          tf.truncated_normal([5, 5, NUM_CHANNELS, 32],  # 5x5 filter, depth 32.
                              seed=SEED, dtype=data_type()))
      conv1_biases = tf.Variable(tf.zeros([32], dtype=data_type()))
      conv2_weights = tf.Variable(tf.truncated_normal(
          [5, 5, 32, 64], stddev=0.1,
          seed=SEED, dtype=data_type()))
      conv2_biases = tf.Variable(tf.constant(0.1, shape=[64], dtype=data_type()))
      fc1_weights = tf.Variable(  # fully connected, depth 512.
          tf.truncated_normal([IMAGE_SIZE // 4 * IMAGE_SIZE // 4 * 64, 512],
      fc1_biases = tf.Variable(tf.constant(0.1, shape=[512], dtype=data_type()))
      fc2_weights = tf.Variable(tf.truncated_normal([512, NUM_LABELS],
      fc2_biases = tf.Variable(tf.constant(
          0.1, shape=[NUM_LABELS], dtype=data_type()))
      # We will replicate the model structure for the training subgraph, as well
      # as the evaluation subgraphs, while sharing the trainable parameters.
      def model(data, train=False):
        """The Model definition."""
        # 2D convolution, with 'SAME' padding (i.e. the output feature map has
        # the same size as the input). Note that {strides} is a 4D array whose
        # shape matches the data layout: [image index, y, x, depth].
        conv = tf.nn.conv2d(data,
                            strides=[1, 1, 1, 1],
        # Bias and rectified linear non-linearity.
        relu = tf.nn.relu(tf.nn.bias_add(conv, conv1_biases))
        # Max pooling. The kernel size spec {ksize} also follows the layout of
        # the data. Here we have a pooling window of 2, and a stride of 2.
        pool = tf.nn.max_pool(relu,
                              ksize=[1, 2, 2, 1],
                              strides=[1, 2, 2, 1],
        conv = tf.nn.conv2d(pool,
                            strides=[1, 1, 1, 1],
        relu = tf.nn.relu(tf.nn.bias_add(conv, conv2_biases))
        pool = tf.nn.max_pool(relu,
                              ksize=[1, 2, 2, 1],
                              strides=[1, 2, 2, 1],
        # Reshape the feature map cuboid into a 2D matrix to feed it to the
        # fully connected layers.
        pool_shape = pool.get_shape().as_list()
        reshape = tf.reshape(
            [pool_shape[0], pool_shape[1] * pool_shape[2] * pool_shape[3]])
        # Fully connected layer. Note that the '+' operation automatically
        # broadcasts the biases.
        hidden = tf.nn.relu(tf.matmul(reshape, fc1_weights) + fc1_biases)
        # Add a 50% dropout during training only. Dropout also scales
        # activations such that no rescaling is needed at evaluation time.
        if train:
          hidden = tf.nn.dropout(hidden, 0.5, seed=SEED)
        return tf.matmul(hidden, fc2_weights) + fc2_biases
      # Training computation: logits + cross-entropy loss.
      logits = model(train_data_node, True)
      loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
          logits, train_labels_node))
      # L2 regularization for the fully connected parameters.
      regularizers = (tf.nn.l2_loss(fc1_weights) + tf.nn.l2_loss(fc1_biases) +
                      tf.nn.l2_loss(fc2_weights) + tf.nn.l2_loss(fc2_biases))
      # Add the regularization term to the loss.
      loss += 5e-4 * regularizers
      # Optimizer: set up a variable that's incremented once per batch and
      # controls the learning rate decay.
      batch = tf.Variable(0, dtype=data_type())
      # Decay once per epoch, using an exponential schedule starting at 0.01.
      learning_rate = tf.train.exponential_decay(
          0.01,                # Base learning rate.
          batch * BATCH_SIZE,  # Current index into the dataset.
          train_size,          # Decay step.
          0.95,                # Decay rate.
      # Use simple momentum for the optimization.
      optimizer = tf.train.MomentumOptimizer(learning_rate,
      # Predictions for the current training minibatch.
      train_prediction = tf.nn.softmax(logits)
      # Predictions for the test and validation, which we'll compute less often.
      eval_prediction = tf.nn.softmax(model(eval_data))

    这段代码很直白,在定义了convolution1,convolution2,fully_connected1和fully_connected2层神经网络的weight和biases参数后,在模型函数里,我们通过conv2d, relu, max_pool等方式在两次重复后将得到的结果重新整理后输入那个fully connected的神经网络中,即matmul(reshape,fc1_weights) + fc1_biases。之后再经历了第二层的fully connected net后得到logits。定义loss以及optimizer等常见的过程后结果是由softmax来取得。这个逻辑我们在CIFAR10里也会见到,它的表达如下:

    def inference(images):
      """Build the CIFAR-10 model.
        images: Images returned from distorted_inputs() or inputs().
      # We instantiate all variables using tf.get_variable() instead of
      # tf.Variable() in order to share variables across multiple GPU training runs.
      # If we only ran this model on a single GPU, we could simplify this function
      # by replacing all instances of tf.get_variable() with tf.Variable().
      # conv1
      with tf.variable_scope('conv1') as scope:
        # 输入的图片由于是彩图,有三个channel,所以在conv2d中,我们规定
        # 输出为64个channel的feature map。
        kernel = _variable_with_weight_decay('weights', shape=[5, 5, 3, 64],
                                             stddev=1e-4, wd=0.0)
        conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
        biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.0))
        bias = tf.nn.bias_add(conv, biases)
        conv1 = tf.nn.relu(bias, name=scope.name)
      # pool1
      pool1 = tf.nn.max_pool(conv1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],
                             padding='SAME', name='pool1')
      # norm1
      norm1 = tf.nn.lrn(pool1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,
      # conv2
      with tf.variable_scope('conv2') as scope:
        # 由于之前的输出是64个channel,即我们这里的输入,我们的shape就会
        # 是输入channel数为64,输出,我们也规定为64
        kernel = _variable_with_weight_decay('weights', shape=[5, 5, 64, 64],
                                             stddev=1e-4, wd=0.0)
        conv = tf.nn.conv2d(norm1, kernel, [1, 1, 1, 1], padding='SAME')
        biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.1))
        bias = tf.nn.bias_add(conv, biases)
        conv2 = tf.nn.relu(bias, name=scope.name)
      # norm2
      norm2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,
      # pool2
      pool2 = tf.nn.max_pool(norm2, ksize=[1, 3, 3, 1],
                             strides=[1, 2, 2, 1], padding='SAME', name='pool2')
      # local3
      with tf.variable_scope('local3') as scope:
        # Move everything into depth so we can perform a single matrix multiply.
        reshape = tf.reshape(pool2, [FLAGS.batch_size, -1])
        dim = reshape.get_shape()[1].value
        # 这里之前在reshape时的那个-1是根据tensor的大小自动定义为batch_size和
        # 剩下的,所以我们剩下的就是一张图的所有内容,我们将它训练并map到384
        # 个神经元节点上
        weights = _variable_with_weight_decay('weights', shape=[dim, 384],
                                              stddev=0.04, wd=0.004)
        biases = _variable_on_cpu('biases', [384], tf.constant_initializer(0.1))
        local3 = tf.nn.relu(tf.matmul(reshape, weights) + biases, name=scope.name)
      # local4
      with tf.variable_scope('local4') as scope:
        weights = _variable_with_weight_decay('weights', shape=[384, 192],
                                              stddev=0.04, wd=0.004)
        biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1))
        local4 = tf.nn.relu(tf.matmul(local3, weights) + biases, name=scope.name)
      # softmax, i.e. softmax(WX + b)
      with tf.variable_scope('softmax_linear') as scope:
        # 这是softmax输出时的网络,我们由192个节点map到输出的不同数量上,这里假设
        # 有10类,我们就输出10个num_classes。
        weights = _variable_with_weight_decay('weights', [192, NUM_CLASSES],
                                              stddev=1/192.0, wd=0.0)
        biases = _variable_on_cpu('biases', [NUM_CLASSES],
        softmax_linear = tf.add(tf.matmul(local4, weights), biases, name=scope.name)
      return softmax_linear

    这里的逻辑跟之前的在框架上基本一样,不同在哪里呢?首先,这次我们的输入是彩图。学过图片处理的朋友肯定知道彩图有3个channel,而之前MNIST只是单个channel的灰白图。所以,在我们制作feature map的时候,由1个channel map到了32个(注,那个NUM_CHANNELS是1)。这里我们不过把NUM_CHANNELS给直接写为了3而已。另外,我们还运用了variable scope,这是一种很好的方式来界定何时对那些变量进行分享,同时,我们也不需要反复定义weight和biases的名字了。


    def train(total_loss, global_step):
      """Train CIFAR-10 model.
      Create an optimizer and apply to all trainable variables. Add moving
      average for all trainable variables.
        total_loss: Total loss from loss().
        global_step: Integer Variable counting the number of training steps
        train_op: op for training.
      # Variables that affect learning rate.
      num_batches_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN / FLAGS.batch_size
      decay_steps = int(num_batches_per_epoch * NUM_EPOCHS_PER_DECAY)
      # Decay the learning rate exponentially based on the number of steps.
      lr = tf.train.exponential_decay(INITIAL_LEARNING_RATE,
      tf.scalar_summary('learning_rate', lr)
      # Generate moving averages of all losses and associated summaries.
      loss_averages_op = _add_loss_summaries(total_loss)
      # Compute gradients.
      # control dependencies的运用。这里只有loss_averages_op完成了
      # 我们才会进行gradient descent的优化。
      with tf.control_dependencies([loss_averages_op]):
        opt = tf.train.GradientDescentOptimizer(lr)
        grads = opt.compute_gradients(total_loss)
      # Apply gradients.
      apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)
      # Add histograms for trainable variables.
      for var in tf.trainable_variables():
        tf.histogram_summary(var.op.name, var)
      # Add histograms for gradients.
      for grad, var in grads:
        if grad is not None:
          tf.histogram_summary(var.op.name + '/gradients', grad)
      # Track the moving averages of all trainable variables.
      variable_averages = tf.train.ExponentialMovingAverage(
          MOVING_AVERAGE_DECAY, global_step)
      variables_averages_op = variable_averages.apply(tf.trainable_variables())
      with tf.control_dependencies([apply_gradient_op, variables_averages_op]):
        train_op = tf.no_op(name='train')
      return train_op

    这里多出的一些内容为收集网络运算时的一些临时结果,如记录所有的loss的loss_averages_op = _add_loss_summaries(total_loss)以及对参数的histogram:tf.histogram_summary(var.op.name, var)。值得注意的地方是这里多次地使用了control_dependency概念,即dependency条件没有达成前,dependency内的代码是不会运行的。这个概念在Tensorflow中有着重要的意义,这里是一个实例,给大家很好的阐述了这个概念,建议有兴趣的朋友可以多加研究。至此,图片的训练便到此为止。



