假设最小化函数 y = x2 , 选择初始点 x0= 5
1. 学习率为1的时候,x在5和-5之间震荡。

1 #学习率为1 2 3 import tensorflow as tf 4 training_steps = 10 5 learning_rate = 1 6 x = tf.Variable(tf.constant(5, dtype=tf.float32),name="x") 7 y = tf.square(x) 8 9 train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(y) 10 11 with tf.Session() as sess: 12 sess.run(tf.global_variables_initializer()) 13 for i in range(training_steps): 14 sess.run(train_op) 15 x_value = sess.run(x) 16 print("After %s iteration(s): x%s is %f."%(i+1,i+1,x_value)) 17 18 19 #输出结果: 20 After 1 iteration(s): x1 is -5.000000. 21 After 2 iteration(s): x2 is 5.000000. 22 After 3 iteration(s): x3 is -5.000000. 23 After 4 iteration(s): x4 is 5.000000. 24 After 5 iteration(s): x5 is -5.000000. 25 After 6 iteration(s): x6 is 5.000000. 26 After 7 iteration(s): x7 is -5.000000. 27 After 8 iteration(s): x8 is 5.000000. 28 After 9 iteration(s): x9 is -5.000000. 29 After 10 iteration(s): x10 is 5.000000.
2.学习率为0.001的时候,下降速度过慢,在901轮时才收敛到0.823355.

1 #学习率为0.001 2 training_steps = 1000 3 learning_rate = 0.001 4 x = tf.Variable(tf.constant(5,dtype=tf.float32),name="x") 5 y = tf.square(x) 6 7 train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(y) 8 9 with tf.Session() as sess: 10 sess.run(tf.global_variables_initializer()) 11 for i in range(training_steps): 12 sess.run(train_op) 13 if i % 100 ==0: 14 x_value = sess.run(x) 15 print("After %s iteration(s): x%s is %f."%(i+1,i+1,x_value)) 16 17 #结果为: 18 19 After 1 iteration(s): x1 is 4.990000. 20 After 101 iteration(s): x101 is 4.084646. 21 After 201 iteration(s): x201 is 3.343555. 22 After 301 iteration(s): x301 is 2.736923. 23 After 401 iteration(s): x401 is 2.240355. 24 After 501 iteration(s): x501 is 1.833880. 25 After 601 iteration(s): x601 is 1.501153. 26 After 701 iteration(s): x701 is 1.228794. 27 After 801 iteration(s): x801 is 1.005850. 28 After 901 iteration(s): x901 is 0.823355.
3.使用指数衰减的学习率,在迭代初期得到较高的下降速度,可以在较小的训练轮数下取得不错的收敛程度。

1 TRAINING_STEPS = 100 2 global_step = tf.Variable(0) 3 LEARNING_RATE = tf.train.exponential_decay(0.1, global_step, 1, 0.96, staircase=True) 4 5 x = tf.Variable(tf.constant(5, dtype=tf.float32), name="x") 6 y = tf.square(x) 7 train_op = tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize(y, global_step=global_step) 8 9 with tf.Session() as sess: 10 sess.run(tf.global_variables_initializer()) 11 for i in range(TRAINING_STEPS): 12 sess.run(train_op) 13 if i % 10 == 0: 14 LEARNING_RATE_value = sess.run(LEARNING_RATE) 15 x_value = sess.run(x) 16 print ("After %s iteration(s): x%s is %f, learning rate is %f."% (i+1, i+1, x_value, LEARNING_RATE_value)) 17 18 #输出结果: 19 20 After 1 iteration(s): x1 is 4.000000, learning rate is 0.096000. 21 After 11 iteration(s): x11 is 0.690561, learning rate is 0.063824. 22 After 21 iteration(s): x21 is 0.222583, learning rate is 0.042432. 23 After 31 iteration(s): x31 is 0.106405, learning rate is 0.028210. 24 After 41 iteration(s): x41 is 0.065548, learning rate is 0.018755. 25 After 51 iteration(s): x51 is 0.047625, learning rate is 0.012469. 26 After 61 iteration(s): x61 is 0.038558, learning rate is 0.008290. 27 After 71 iteration(s): x71 is 0.033523, learning rate is 0.005511. 28 After 81 iteration(s): x81 is 0.030553, learning rate is 0.003664. 29 After 91 iteration(s): x91 is 0.028727, learning rate is 0.002436.