zoukankan      html  css  js  c++  java
  • 批量归一化batch_normalization

    为了解决在深度神经网络训练初期降低梯度消失/爆炸问题,Sergey loffe和Christian Szegedy提出了使用批量归一化的技术的方案,该技术包括在每一层激活函数之前在模型里加一个操作,简单零中心化和归一化输入,之后再通过每层的两个新参数(一个缩放,另一个移动)缩放和移动结果,话句话说,这个操作让模型学会最佳模型和每层输入的平均值

    批量归一化原理

    (1)(mu_B = frac{1}{m_B}sum_{i=1}^{m_B}x^{(i)}) #经验平均值,评估整个小批量B

    (2)( heta_B = frac{1}{m_B}sum_{i=1}^{m_b}(x^{(i)} - mu_B)^2) #评估整个小批量B的方差

    (3)(x_{(i)}^* = frac{x^{(i)} - mu_B}{sqrt{ heta_B^2+xi}})#零中心化和归一化

    (4)(z^{(i)} = lambda x_{(i)}^* + eta)#将输入进行缩放和移动

    在测试期间,没有小批量的数据来计算经验平均值和标准方差,所有可以简单地用整个训练集的平均值和标准方差来代替,在训练过程中可以用变动平均值有效计算出来

    但是,批量归一化的确也给模型增加了一些复杂度和运行代价,使得神经网络的预测速度变慢,所以如果逆需要快速预测,可能需要在进行批量归一化之前先检查以下ELU+He初始化的表现如何

    tf.layers.batch_normalization使用

    函数原型

    def batch_normalization(inputs,
                        axis=-1,
                        momentum=0.99,
                        epsilon=1e-3,
                        center=True,
                        scale=True,
                        beta_initializer=init_ops.zeros_initializer(),
                        gamma_initializer=init_ops.ones_initializer(),
                        moving_mean_initializer=init_ops.zeros_initializer(),
                        moving_variance_initializer=init_ops.ones_initializer(),
                        beta_regularizer=None,
                        gamma_regularizer=None,
                        beta_constraint=None,
                        gamma_constraint=None,
                        training=False,
                        trainable=True,
                        name=None,
                        reuse=None,
                        renorm=False,
                        renorm_clipping=None,
                        renorm_momentum=0.99,
                        fused=None,
                        virtual_batch_size=None,
                        adjustment=None):
    

    使用注意事项

    (1)使用batch_normalization需要三步:

    a.在卷积层将激活函数设置为None
    b.使用batch_normalization
    c.使用激活函数激活
    
    例子:
    inputs = tf.layers.dense(inputs,self.n_neurons,
                                       kernel_initializer=self.initializer,
                                       name = 'hidden%d'%(layer+1))
    if self.batch_normal_momentum:
        inputs = tf.layers.batch_normalization(inputs,momentum=self.batch_normal_momentum,train=self._training)
    
    inputs = self.activation(inputs,name = 'hidden%d_out'%(layer+1))
    

    (2)在训练时,将参数training设置为True,在测试时,将training设置为False,同时要特别注意update_ops的使用

    update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    需要在每次训练时更新,可以使用sess.run(update_ops)
    也可以:
    with tf.control_dependencies(update_ops):
        train_op = tf.train.AdamOptimizer(learning_rate).minimize(loss)
    

    使用mnist数据集进行简单测试

    from tensorflow.examples.tutorials.mnist import input_data
    import tensorflow as tf
    import numpy as np
    
    mnist = input_data.read_data_sets('MNIST_data',one_hot=True)
    x_train,y_train = mnist.train.images,mnist.train.labels
    x_test,y_test = mnist.test.images,mnist.test.labels
    
    Extracting MNIST_data	rain-images-idx3-ubyte.gz
    Extracting MNIST_data	rain-labels-idx1-ubyte.gz
    Extracting MNIST_data	10k-images-idx3-ubyte.gz
    Extracting MNIST_data	10k-labels-idx1-ubyte.gz
    
    he_init = tf.contrib.layers.variance_scaling_initializer()
    def dnn(inputs,n_hiddens=1,n_neurons=100,initializer=he_init,activation=tf.nn.elu,batch_normalization=None,training=None):
        for layer in range(n_hiddens):
            inputs = tf.layers.dense(inputs,n_neurons,kernel_initializer=initializer,name = 'hidden%d'%(layer+1))
            if batch_normalization is not None:   
                inputs = tf.layers.batch_normalization(inputs,momentum=batch_normalization,training=training)
            inputs = activation(inputs,name = 'hidden%d'%(layer+1))
        return inputs
    
    tf.reset_default_graph()
    n_inputs = 28*28
    n_hidden = 100
    n_outputs = 10
    
    X = tf.placeholder(tf.float32,shape=(None,n_inputs),name='X')
    Y = tf.placeholder(tf.int32,shape=(None,n_outputs),name='Y')
    
    training = tf.placeholder_with_default(False,shape=(),name='tarining')
    dnn_outputs = dnn(X)
    
    logits = tf.layers.dense(dnn_outputs,n_outputs,kernel_initializer = he_init,name='logits')
    y_proba = tf.nn.softmax(logits,name='y_proba')
    xentropy = tf.nn.softmax_cross_entropy_with_logits(labels=Y,logits=y_proba)
    loss = tf.reduce_mean(xentropy,name='loss')
    train_op = tf.train.AdamOptimizer(learning_rate=0.01).minimize(loss)
    
    correct = tf.equal(tf.argmax(Y,1),tf.argmax(y_proba,1))
    accuracy = tf.reduce_mean(tf.cast(correct,tf.float32))
    
    epoches = 20
    batch_size = 100
    np.random.seed(42)
    
    init = tf.global_variables_initializer()
    rnd_index = np.random.permutation(len(x_train))
    n_batches = len(x_train) // batch_size
    with tf.Session() as sess:
        sess.run(init)
        for epoch in range(epoches):       
            for batch_index in np.array_split(rnd_index,n_batches):
                x_batch,y_batch = x_train[batch_index],y_train[batch_index]
                feed_dict = {X:x_batch,Y:y_batch,training:True}
                sess.run(train_op,feed_dict=feed_dict)
            loss_val,accuracy_val = sess.run([loss,accuracy],feed_dict={X:x_test,Y:y_test,training:False})
            print('epoch:{},loss:{},accuracy:{}'.format(epoch,loss_val,accuracy_val))
    
  • 相关阅读:
    Spring Boot启动过程(四):Spring Boot内嵌Tomcat启动
    dubbox注解的一个坑
    内嵌Tomcat的Connector对象的静态代码块
    Spring Boot启动过程(三)
    Spring Boot启动过程(二)
    Spring Boot启动过程(一)
    SpringMVC基础学习(二)—开发Handler
    JS弹出框
    Oracle的基本学习(三)—函数
    Oracle的基本学习(二)—基本查询
  • 原文地址:https://www.cnblogs.com/xiaobingqianrui/p/10770302.html
Copyright © 2011-2022 走看看