zoukankan      html  css  js  c++  java
  • 滑动平均模型在Tensorflow中的应用

    目的

    在Tensorflow的教程里面,使用梯度下降算法训练神经网络时,都会提到一个使模型更加健壮的策略,即滑动平均模型。本文基于最近一段时间的学习,记录一下自己的理解。

    基本思想

    在使用梯度下降算法训练模型时,每次更新权重时,为每个权重维护一个影子变量,该影子变量随着训练的进行,会最终稳定在一个接近真实权重的值的附近。那么,在进行预测的时候,使用影子变量的值替代真实变量的值,可以得到更好的结果。

    操作步骤

    1 训练阶段:为每个可训练的权重维护影子变量,并随着迭代的进行更新;
    2 预测阶段:使用影子变量替代真实变量值,进行预测。

    结果

    不使用滑动平均模型

    After 0 training steps, validation accuracy is 0.16740000247955322
    After 1000 training steps, validation accuracy is 0.9747997522354126
    After 2000 training steps, validation accuracy is 0.9775997400283813
    After 3000 training steps, validation accuracy is 0.9811996817588806
    After 4000 training steps, validation accuracy is 0.9805997014045715
    after 5000 steps, test accuracy is 0.9790000915527344
    

    使用滑动平均模型

    After 0 training steps, validation accuracy is 0.16499999165534973
    After 1000 training steps, validation accuracy is 0.9763997197151184
    After 2000 training steps, validation accuracy is 0.9829997420310974
    After 3000 training steps, validation accuracy is 0.9825997352600098
    After 4000 training steps, validation accuracy is 0.9843996167182922
    after 5000 steps, test accuracy is 0.9821001291275024
    

    从结果看,移动滑动平均模型后,在验证和测试集上都又快又好。

    代码

    代码中72行和73行控制预测阶段是否使用滑动平均模型。72行打开则表示使用滑动平均模型;73行打开则表示不使用滑动平均模型。

    import tensorflow as tf
    from tensorflow.examples.tutorials.mnist import input_data
    
    mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
    
    BATCH_SIZE = 100
    INPUT_NODE = 784
    OUTPUT_NODE = 10
    LAYER1_NODE = 500
    
    LEARNING_RATE_BASE = 0.8
    LEARNING_RATE_DECAY = 0.99
    
    REGULARIZATION_RATE = 0.0001
    
    TRAINING_STEPS = 5000
    
    MOVING_AVERAGE_DECAY = 0.9999
    
    
    def inference(input_tensor, avg_class, weights1, biases1, weights2, biases2):
        # 不使用滑动平均类
        if avg_class == None:
            layer1 = tf.nn.relu(tf.matmul(input_tensor, weights1) + biases1)
            return tf.matmul(layer1, weights2) + biases2
        else:
            # 使用滑动平均类
            layer1 = tf.nn.relu(tf.matmul(input_tensor, avg_class.average(weights1)) + avg_class.average(biases1))
            return tf.matmul(layer1, avg_class.average(weights2)) + avg_class.average(biases2)
    
    
    def train(mnist):
        x = tf.placeholder(tf.float32, shape=[None, INPUT_NODE], name='input_x')
        y_ = tf.placeholder(tf.float32, shape=[None, OUTPUT_NODE], name='input_y')
    
        weights1 = tf.Variable(tf.truncated_normal(shape=[INPUT_NODE, LAYER1_NODE], stddev=0.1))
        biases1 = tf.Variable(tf.constant(0.1, shape=[LAYER1_NODE]))
    
        weights2 = tf.Variable(tf.truncated_normal(shape=[LAYER1_NODE, OUTPUT_NODE], stddev=0.1))
        biases2 = tf.Variable(tf.constant(0.1, shape=[OUTPUT_NODE]))
    
        y = inference(x, None, weights1=weights1, biases1=biases1, weights2=weights2, biases2=biases2)
    
        global_step = tf.Variable(0, trainable=False)
    
        variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
        variables_averages_op = variable_averages.apply(tf.trainable_variables())
    
        average_y = inference(x, variable_averages, weights1, biases1, weights2, biases2)
    
        cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits= y,labels= tf.arg_max(y_, 1))
        cross_entropy_mean = tf.reduce_mean(cross_entropy)
    
        regularizer = tf.contrib.layers.l2_regularizer(REGULARIZATION_RATE)
        regularization = regularizer(weights1) + regularizer(weights2)
    
        loss = cross_entropy_mean + regularization
    
        learning_rate = tf.train.exponential_decay(
            LEARNING_RATE_BASE,
            global_step,
            mnist.train.num_examples / BATCH_SIZE,
            LEARNING_RATE_DECAY,
            staircase=True
        )
    
        train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
    
        with tf.control_dependencies([train_step, variables_averages_op]):
            train_op = tf.no_op(name='train')
    
        # correct_prediction = tf.equal(tf.argmax(average_y, 1), tf.argmax(y_, 1))
        correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
        with tf.Session() as sess:
            tf.global_variables_initializer().run()
    
            validate_feed = {
                x: mnist.validation.images,
                y_: mnist.validation.labels
            }
    
            test_feed = {
                x: mnist.test.images,
                y_: mnist.test.labels
            }
    
            for i in range(TRAINING_STEPS):
                if i % 1000 == 0:
                    validate_acc = sess.run(accuracy, feed_dict=validate_feed)
                    print(f'After {i} training steps, validation accuracy is {validate_acc}')
    
                xs, ys = mnist.train.next_batch(BATCH_SIZE)
                sess.run(train_op, feed_dict={x: xs, y_: ys})
    
            test_acc = sess.run(accuracy, feed_dict=test_feed)
            print(f'after {TRAINING_STEPS} steps, test accuracy is {test_acc}')
    
    
    def main(argv=None):
        train(mnist)
    
    
    if __name__ == '__main__':
        main()
    
    
    

    总结

    1 滑动平均模型在梯段下降算法上才会有好的结果,别的优化算法没有这个现象,没见到合理的解释。
    2 优化的方法有很多,可能这个可以作为最后的提高健壮性的错失。

  • 相关阅读:
    决定迁移过来,深耕于此。。。
    一篇搞定MongoDB
    一篇搞定vue请求和跨域
    自定义全局组件
    一篇搞定vue-router
    一篇搞定Vuex
    vue系列
    .Vue.js大全
    一篇搞定spring Jpa操作数据库
    自定义admin
  • 原文地址:https://www.cnblogs.com/ledao/p/15085685.html
Copyright © 2011-2022 走看看