Part I: Preparation
略
Part II:Barebone TensorFlow
首先实现一个flatten函数:
def flatten(x):
"""
Input:
- TensorFlow Tensor of shape (N, D1, ..., DM)
Output:
- TensorFlow Tensor of shape (N, D1 * ... * DM)
"""
N = tf.shape(x)[0]
return tf.reshape(x, (N, -1))
完成一个两层的全连接网络并测试:
def two_layer_fc(x, params):
"""
A fully-connected neural network; the architecture is:
fully-connected layer -> ReLU -> fully connected layer.
Note that we only need to define the forward pass here; TensorFlow will take
care of computing the gradients for us.
The input to the network will be a minibatch of data, of shape
(N, d1, ..., dM) where d1 * ... * dM = D. The hidden layer will have H units,
and the output layer will produce scores for C classes.
Inputs:
- x: A TensorFlow Tensor of shape (N, d1, ..., dM) giving a minibatch of
input data.
- params: A list [w1, w2] of TensorFlow Tensors giving weights for the
network, where w1 has shape (D, H) and w2 has shape (H, C).
Returns:
- scores: A TensorFlow Tensor of shape (N, C) giving classification scores
for the input data x.
"""
w1, w2 = params # Unpack the parameters
x = flatten(x) # Flatten the input; now x has shape (N, D)
h = tf.nn.relu(tf.matmul(x, w1)) # Hidden layer: h has shape (N, H)
scores = tf.matmul(h, w2) # Compute scores of shape (N, C)
return scores
def two_layer_fc_test():
# TensorFlow's default computational graph is essentially a hidden global
# variable. To avoid adding to this default graph when you rerun this cell,
# we clear the default graph before constructing the graph we care about.
tf.reset_default_graph()
hidden_layer_size = 42
# Scoping our computational graph setup code under a tf.device context
# manager lets us tell TensorFlow where we want these Tensors to be
# placed.
with tf.device(device):
# Set up a placehoder for the input of the network, and constant
# zero Tensors for the network weights. Here we declare w1 and w2
# using tf.zeros instead of tf.placeholder as we've seen before - this
# means that the values of w1 and w2 will be stored in the computational
# graph itself and will persist across multiple runs of the graph; in
# particular this means that we don't have to pass values for w1 and w2
# using a feed_dict when we eventually run the graph.
#这里w1,w2用tf.zeros来初始化,就不用去feed data了。
x = tf.placeholder(tf.float32)
w1 = tf.zeros((32 * 32 * 3, hidden_layer_size))
w2 = tf.zeros((hidden_layer_size, 10))
# Call our two_layer_fc function to set up the computational
# graph for the forward pass of the network.
scores = two_layer_fc(x, [w1, w2])
# Use numpy to create some concrete data that we will pass to the
# computational graph for the x placeholder.
x_np = np.zeros((64, 32, 32, 3))
with tf.Session() as sess:
# The calls to tf.zeros above do not actually instantiate the values
# for w1 and w2; the following line tells TensorFlow to instantiate
# the values of all Tensors (like w1 and w2) that live in the graph.
sess.run(tf.global_variables_initializer())
#运行了这句话之后,tf.zeros才真正得到赋值。
# Here we actually run the graph, using the feed_dict to pass the
# value to bind to the placeholder for x; we ask TensorFlow to compute
# the value of the scores Tensor, which it returns as a numpy array.
scores_np = sess.run(scores, feed_dict={x: x_np})
print(scores_np.shape)
two_layer_fc_test()
完成一个3层的卷积网络并测试:
网络结构如下:
- A convolutional layer (with bias) with
channel_1
filters, each with shapeKW1 x KH1
, and zero-padding of two - ReLU nonlinearity
- A convolutional layer (with bias) with
channel_2
filters, each with shapeKW2 x KH2
, and zero-padding of one - ReLU nonlinearity
- Fully-connected layer with bias, producing scores for
C
classes.
def three_layer_convnet(x, params):
"""
A three-layer convolutional network with the architecture described above.
Inputs:
- x: A TensorFlow Tensor of shape (N, H, W, 3) giving a minibatch of images
- params: A list of TensorFlow Tensors giving the weights and biases for the
network; should contain the following:
- conv_w1: TensorFlow Tensor of shape (KH1, KW1, 3, channel_1) giving
weights for the first convolutional layer.
- conv_b1: TensorFlow Tensor of shape (channel_1,) giving biases for the
first convolutional layer.
- conv_w2: TensorFlow Tensor of shape (KH2, KW2, channel_1, channel_2)
giving weights for the second convolutional layer
- conv_b2: TensorFlow Tensor of shape (channel_2,) giving biases for the
second convolutional layer.
- fc_w: TensorFlow Tensor giving weights for the fully-connected layer.
Can you figure out what the shape should be? (channel_2 * * *,10)
- fc_b: TensorFlow Tensor giving biases for the fully-connected layer.
Can you figure out what the shape should be? (10,1)
"""
conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b = params
scores = None
############################################################################
# TODO: Implement the forward pass for the three-layer ConvNet. #
############################################################################
h1 = tf.nn.conv2d(input = x,filter = conv_w1,strides = [1,1,1,1],padding = 'SAME',name = 'conv1') + conv_b1
h11 = tf.nn.relu(h1)
h2 = tf.nn.conv2d(input = h11,filter = conv_w2,strides = [1,1,1,1],padding = 'SAME' ,name = 'conv2') + conv_b2
h22 = tf.nn.relu(h2)
h = flatten(h22)
scores = tf.matmul(h,fc_w) + fc_b
############################################################################
# END OF YOUR CODE #
############################################################################
return scores
def three_layer_convnet_test():
tf.reset_default_graph()
with tf.device(device):
x = tf.placeholder(tf.float32)
conv_w1 = tf.zeros((5, 5, 3, 6))
conv_b1 = tf.zeros((6,))
conv_w2 = tf.zeros((3, 3, 6, 9))
conv_b2 = tf.zeros((9,))
fc_w = tf.zeros((32 * 32 * 9, 10))
fc_b = tf.zeros((10,))
params = [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b]
scores = three_layer_convnet(x, params)
# Inputs to convolutional layers are 4-dimensional arrays with shape
# [batch_size, height, width, channels]
x_np = np.zeros((64, 32, 32, 3))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
scores_np = sess.run(scores, feed_dict={x: x_np})
print('scores_np has shape: ', scores_np.shape)
with tf.device('/gpu:0'):
three_layer_convnet_test()
完成train step,在一个step中会做这些事:
- Compute the loss
- Compute the gradient of the loss with respect to all network weights
- Make a weight update step using (stochastic) gradient descent.
def training_step(scores, y, params, learning_rate):
"""
Set up the part of the computational graph which makes a training step.
Inputs:
- scores: TensorFlow Tensor of shape (N, C) giving classification scores for
the model.
- y: TensorFlow Tensor of shape (N,) giving ground-truth labels for scores;
y[i] == c means that c is the correct class for scores[i].
- params: List of TensorFlow Tensors giving the weights of the model
- learning_rate: Python scalar giving the learning rate to use for gradient
descent step.
Returns:
- loss: A TensorFlow Tensor of shape () (scalar) giving the loss for this
batch of data; evaluating the loss also performs a gradient descent step
on params (see above).
"""
# First compute the loss; the first line gives losses for each example in
# the minibatch, and the second averages the losses acros the batch
losses = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=scores)
loss = tf.reduce_mean(losses) #计算loss
# Compute the gradient of the loss with respect to each parameter of the the
# network. This is a very magical function call: TensorFlow internally
# traverses the computational graph starting at loss backward to each element
# of params, and uses backpropagation to figure out how to compute gradients;
# it then adds new operations to the computational graph which compute the
# requested gradients, and returns a list of TensorFlow Tensors that will
# contain the requested gradients when evaluated.
grad_params = tf.gradients(loss, params) #计算梯度
# Make a gradient descent step on all of the model parameters.
new_weights = []
for w, grad_w in zip(params, grad_params): #更新参数
new_w = tf.assign_sub(w, learning_rate * grad_w)
new_weights.append(new_w)
# Insert a control dependency so that evaluting the loss causes a weight
# update to happen; see the discussion above.
with tf.control_dependencies(new_weights): #建立更新权重和loss之间的依赖关系
return tf.identity(loss)
完成train loop:
def train_part2(model_fn, init_fn, learning_rate):
"""
Train a model on CIFAR-10.
Inputs:
- model_fn: A Python function that performs the forward pass of the model
using TensorFlow; it should have the following signature: 我们设计的网络模型
scores = model_fn(x, params) where x is a TensorFlow Tensor giving a
minibatch of image data, params is a list of TensorFlow Tensors holding
the model weights, and scores is a TensorFlow Tensor of shape (N, C)
giving scores for all elements of x.
- init_fn: A Python function that initializes the parameters of the model.
It should have the signature params = init_fn() where params is a list
of TensorFlow Tensors holding the (randomly initialized) weights of the
model. 初始化参数的函数
- learning_rate: Python float giving the learning rate to use for SGD.
"""
# First clear the default graph
tf.reset_default_graph()
is_training = tf.placeholder(tf.bool, name='is_training')
# Set up the computational graph for performing forward and backward passes,
# and weight updates.
with tf.device(device):
# Set up placeholders for the data and labels
x = tf.placeholder(tf.float32, [None, 32, 32, 3])
y = tf.placeholder(tf.int32, [None])
params = init_fn() # Initialize the model parameters
scores = model_fn(x, params) # Forward pass of the model
loss = training_step(scores, y, params, learning_rate)
# Now we actually run the graph many times using the training data
with tf.Session() as sess:
# Initialize variables that will live in the graph
sess.run(tf.global_variables_initializer())
for t, (x_np, y_np) in enumerate(train_dset):
# Run the graph on a batch of training data; recall that asking
# TensorFlow to evaluate loss will cause an SGD step to happen.
feed_dict = {x: x_np, y: y_np}
loss_np = sess.run(loss, feed_dict=feed_dict)
# Periodically print the loss and check accuracy on the val set
if t % print_every == 0:
print('Iteration %d, loss = %.4f' % (t, loss_np))
check_accuracy(sess, val_dset, x, scores, is_training)
Kaiming's normalization:
def kaiming_normal(shape):
if len(shape) == 2:
fan_in, fan_out = shape[0], shape[1]
elif len(shape) == 4:
fan_in, fan_out = np.prod(shape[:3]), shape[3]
return tf.random_normal(shape) * np.sqrt(2.0 / fan_in)
训练我们的两层网络:
def two_layer_fc_init():
"""
Initialize the weights of a two-layer network, for use with the
two_layer_network function defined above.
Inputs: None
Returns: A list of:
- w1: TensorFlow Variable giving the weights for the first layer
- w2: TensorFlow Variable giving the weights for the second layer
"""
hidden_layer_size = 4000
w1 = tf.Variable(kaiming_normal((3 * 32 * 32, 4000)))
w2 = tf.Variable(kaiming_normal((4000, 10)))
return [w1, w2]
learning_rate = 1e-2
train_part2(two_layer_fc, two_layer_fc_init, learning_rate)
Iteration 0, loss = 2.8053
Got 134 / 1000 correct (13.40%)
Iteration 100, loss = 1.9526
Got 383 / 1000 correct (38.30%)
Iteration 200, loss = 1.4617
Got 393 / 1000 correct (39.30%)
Iteration 300, loss = 1.7108
Got 372 / 1000 correct (37.20%)
Iteration 400, loss = 1.8420
Got 421 / 1000 correct (42.10%)
Iteration 500, loss = 1.8536
Got 429 / 1000 correct (42.90%)
Iteration 600, loss = 1.8949
Got 413 / 1000 correct (41.30%)
Iteration 700, loss = 1.9321
Got 424 / 1000 correct (42.40%)
训练我们的三层网络:
def three_layer_convnet_init():
"""
Initialize the weights of a Three-Layer ConvNet, for use with the
three_layer_convnet function defined above.
Inputs: None
Returns a list containing:
- conv_w1: TensorFlow Variable giving weights for the first conv layer
- conv_b1: TensorFlow Variable giving biases for the first conv layer
- conv_w2: TensorFlow Variable giving weights for the second conv layer
- conv_b2: TensorFlow Variable giving biases for the second conv layer
- fc_w: TensorFlow Variable giving weights for the fully-connected layer
- fc_b: TensorFlow Variable giving biases for the fully-connected layer
"""
params = None
############################################################################
# TODO: Initialize the parameters of the three-layer network. #
############################################################################
w1 = tf.Variable(kaiming_normal((5,5,3,6)))
b1 = tf.Variable(kaiming_normal((1,6)))
w2 = tf.Variable(kaiming_normal((3,3,6,9)))
b2 = tf.Variable(kaiming_normal((1,9)))
w = tf.Variable(kaiming_normal((32 * 32 * 9,10)))
b = tf.Variable(kaiming_normal((1,10)))
params = [w1,b1,w2,b2,w,b]
############################################################################
# END OF YOUR CODE #
############################################################################
return params
learning_rate = 3e-3
train_part2(three_layer_convnet, three_layer_convnet_init, learning_rate)
Iteration 0, loss = 3.4851
Got 96 / 1000 correct (9.60%)
Iteration 100, loss = 1.8512
Got 323 / 1000 correct (32.30%)
Iteration 200, loss = 1.6490
Got 372 / 1000 correct (37.20%)
Iteration 300, loss = 1.8010
Got 360 / 1000 correct (36.00%)
Iteration 400, loss = 1.8237
Got 394 / 1000 correct (39.40%)
Iteration 500, loss = 1.8371
Got 412 / 1000 correct (41.20%)
Iteration 600, loss = 1.7767
Got 428 / 1000 correct (42.80%)
Iteration 700, loss = 1.6171
Got 430 / 1000 correct (43.00%)
Part III: Keras Model API
使用Module API构建一个两层的全连接网络:
class TwoLayerFC(tf.keras.Model): #定义为一个类
def __init__(self, hidden_size, num_classes): #定义网络结构
super().__init__()
initializer = tf.variance_scaling_initializer(scale=2.0)
self.fc1 = tf.layers.Dense(hidden_size, activation=tf.nn.relu,
kernel_initializer=initializer) #定义了全连接层,使用relu和初始化方法
#tf.layers.Dense是一个类
self.fc2 = tf.layers.Dense(num_classes,
kernel_initializer=initializer)
def call(self, x, training=None): #然后调用
x = tf.layers.flatten(x) #拉直x
x = self.fc1(x)
x = self.fc2(x)
return x
def test_TwoLayerFC():
""" A small unit test to exercise the TwoLayerFC model above. """
tf.reset_default_graph()
input_size, hidden_size, num_classes = 50, 42, 10
# As usual in TensorFlow, we first need to define our computational graph.
# To this end we first construct a TwoLayerFC object, then use it to construct
# the scores Tensor.
model = TwoLayerFC(hidden_size, num_classes)
with tf.device(device):
x = tf.zeros((64, input_size))
scores = model(x)
# Now that our computational graph has been defined we can run the graph
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
scores_np = sess.run(scores)
print(scores_np.shape)
test_TwoLayerFC()
使用Funtional API构建一个两层的全连接网络:
def two_layer_fc_functional(inputs, hidden_size, num_classes): #定义为一个函数
initializer = tf.variance_scaling_initializer(scale=2.0)
flattened_inputs = tf.layers.flatten(inputs)
fc1_output = tf.layers.dense(flattened_inputs, hidden_size, activation=tf.nn.relu,
kernel_initializer=initializer)
#tf.layers.dense 是一个函数
scores = tf.layers.dense(fc1_output, num_classes,
kernel_initializer=initializer)
return scores
def test_two_layer_fc_functional():
""" A small unit test to exercise the TwoLayerFC model above. """
tf.reset_default_graph()
input_size, hidden_size, num_classes = 50, 42, 10
# As usual in TensorFlow, we first need to define our computational graph.
# To this end we first construct a two layer network graph by calling the
# two_layer_network() function. This function constructs the computation
# graph and outputs the score tensor.
with tf.device(device):
x = tf.zeros((64, input_size))
scores = two_layer_fc_functional(x, hidden_size, num_classes)
# Now that our computational graph has been defined we can run the graph
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
scores_np = sess.run(scores)
print(scores_np.shape)
test_two_layer_fc_functional()
使用Keras Model API构建一个三层卷积网络:
- Convolutional layer with 5 x 5 kernels, with zero-padding of 2
- ReLU nonlinearity
- Convolutional layer with 3 x 3 kernels, with zero-padding of 1
- ReLU nonlinearity
- Fully-connected layer to give class scores
class ThreeLayerConvNet(tf.keras.Model):
def __init__(self, channel_1, channel_2, num_classes):
super().__init__()
########################################################################
# TODO: Implement the __init__ method for a three-layer ConvNet. You #
# should instantiate layer objects to be used in the forward pass. #
########################################################################
initializer = tf.variance_scaling_initializer(scale=2.0)
self.conv1 = tf.layers.Conv2D(filters = channel_1,kernel_size = [5,5],
strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
use_bias = True,kernel_initializer = initializer,
bias_initializer = initializer,name = 'conv1')
self.conv2 = tf.layers.Conv2D(filters = channel_2,kernel_size = [3,3],
strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
use_bias = True,kernel_initializer = initializer,
bias_initializer = initializer,name = 'conv1')
self.fc = tf.layers.Dense(units = num_classes,use_bias = True,
kernel_initializer = initializer,bias_initializer = initializer,
name = 'fc')
########################################################################
# END OF YOUR CODE #
########################################################################
def call(self, x, training=None):
scores = None
########################################################################
# TODO: Implement the forward pass for a three-layer ConvNet. You #
# should use the layer objects defined in the __init__ method. #
########################################################################
x = self.conv1(x)
x = self.conv2(x)
x = tf.layers.flatten(x)
scores = self.fc(x)
########################################################################
# END OF YOUR CODE #
########################################################################
return scores
Keras Model API: Training Loop
def train_part34(model_init_fn, optimizer_init_fn, num_epochs=1):
"""
Simple training loop for use with models defined using tf.keras. It trains
a model for one epoch on the CIFAR-10 training set and periodically checks
accuracy on the CIFAR-10 validation set.
Inputs:
- model_init_fn: A function that takes no parameters; when called it
constructs the model we want to train: model = model_init_fn()
- optimizer_init_fn: A function which takes no parameters; when called it
constructs the Optimizer object we will use to optimize the model:
optimizer = optimizer_init_fn()
- num_epochs: The number of epochs to train for
Returns: Nothing, but prints progress during trainingn
"""
tf.reset_default_graph()
with tf.device(device):
# Construct the computational graph we will use to train the model. We
# use the model_init_fn to construct the model, declare placeholders for
# the data and labels
x = tf.placeholder(tf.float32, [None, 32, 32, 3])
y = tf.placeholder(tf.int32, [None])
# We need a place holder to explicitly specify if the model is in the training
# phase or not. This is because a number of layers behaves differently in
# training and in testing, e.g., dropout and batch normalization.
# We pass this variable to the computation graph through feed_dict as shown below.
is_training = tf.placeholder(tf.bool, name='is_training')
# Use the model function to build the forward pass.
scores = model_init_fn(x, is_training)
# Compute the loss like we did in Part II
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=scores)
loss = tf.reduce_mean(loss)
# Use the optimizer_fn to construct an Optimizer, then use the optimizer
# to set up the training step. Asking TensorFlow to evaluate the
# train_op returned by optimizer.minimize(loss) will cause us to make a
# single update step using the current minibatch of data.
# Note that we use tf.control_dependencies to force the model to run
# the tf.GraphKeys.UPDATE_OPS at each training step. tf.GraphKeys.UPDATE_OPS
# holds the operators that update the states of the network.
# For example, the tf.layers.batch_normalization function adds the running mean
# and variance update operators to tf.GraphKeys.UPDATE_OPS.
optimizer = optimizer_init_fn()
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss)
# Now we can run the computational graph many times to train the model.
# When we call sess.run we ask it to evaluate train_op, which causes the
# model to update.
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
t = 0
for epoch in range(num_epochs):
print('Starting epoch %d' % epoch)
for x_np, y_np in train_dset:
feed_dict = {x: x_np, y: y_np, is_training:1}
loss_np, _ = sess.run([loss, train_op], feed_dict=feed_dict)
if t % print_every == 0:
print('Iteration %d, loss = %.4f' % (t, loss_np))
check_accuracy(sess, val_dset, x, scores, is_training=is_training)
print()
t += 1
Keras Model API: Train a Two-Layer Network
hidden_size, num_classes = 4000, 10
learning_rate = 1e-2
def model_init_fn(inputs, is_training):
return TwoLayerFC(hidden_size, num_classes)(inputs)
def optimizer_init_fn():
return tf.train.GradientDescentOptimizer(learning_rate)
train_part34(model_init_fn, optimizer_init_fn)
Starting epoch 0
Iteration 0, loss = 2.9554
Got 147 / 1000 correct (14.70%)
Iteration 100, loss = 1.8660
Got 374 / 1000 correct (37.40%)
Iteration 200, loss = 1.5924
Got 391 / 1000 correct (39.10%)
Iteration 300, loss = 1.8491
Got 390 / 1000 correct (39.00%)
Iteration 400, loss = 1.7189
Got 430 / 1000 correct (43.00%)
Iteration 500, loss = 1.7548
Got 432 / 1000 correct (43.20%)
Iteration 600, loss = 1.8440
Got 418 / 1000 correct (41.80%)
Iteration 700, loss = 1.9507
Got 451 / 1000 correct (45.10%)
Keras Model API: Train a Two-Layer Network (functional API)
hidden_size, num_classes = 4000, 10
learning_rate = 1e-2
def model_init_fn(inputs, is_training):
return two_layer_fc_functional(inputs, hidden_size, num_classes)
def optimizer_init_fn():
return tf.train.GradientDescentOptimizer(learning_rate)
train_part34(model_init_fn, optimizer_init_fn)
Starting epoch 0
Iteration 0, loss = 3.2064
Got 113 / 1000 correct (11.30%)
Iteration 100, loss = 1.8935
Got 374 / 1000 correct (37.40%)
Iteration 200, loss = 1.5011
Got 384 / 1000 correct (38.40%)
Iteration 300, loss = 1.9119
Got 359 / 1000 correct (35.90%)
Iteration 400, loss = 1.8919
Got 416 / 1000 correct (41.60%)
Iteration 500, loss = 1.7257
Got 430 / 1000 correct (43.00%)
Iteration 600, loss = 1.9092
Got 414 / 1000 correct (41.40%)
Iteration 700, loss = 2.0570
Got 449 / 1000 correct (44.90%)
Keras Model API: Train a Three-Layer ConvNet
learning_rate = 3e-3
channel_1, channel_2, num_classes = 32, 16, 10
def model_init_fn(inputs, is_training):
model = None
############################################################################
# TODO: Complete the implementation of model_fn. #
############################################################################
model = ThreeLayerConvNet(channel_1,channel_2,num_classes)
############################################################################
# END OF YOUR CODE #
############################################################################
return model(inputs)
def optimizer_init_fn():
optimizer = None
############################################################################
# TODO: Complete the implementation of model_fn. #
############################################################################
optimizer = tf.train.MomentumOptimizer(learning_rate= learning_rate,momentum = 0.9,use_nesterov = True)
############################################################################
# END OF YOUR CODE #
############################################################################
return optimizer
train_part34(model_init_fn, optimizer_init_fn)
Starting epoch 0
Iteration 0, loss = 3.5594
Got 81 / 1000 correct (8.10%)
Iteration 100, loss = 1.6427
Got 394 / 1000 correct (39.40%)
Iteration 200, loss = 1.4471
Got 453 / 1000 correct (45.30%)
Iteration 300, loss = 1.4377
Got 472 / 1000 correct (47.20%)
Iteration 400, loss = 1.4059
Got 489 / 1000 correct (48.90%)
Iteration 500, loss = 1.5382
Got 535 / 1000 correct (53.50%)
Iteration 600, loss = 1.3765
Got 525 / 1000 correct (52.50%)
Iteration 700, loss = 1.4015
Got 518 / 1000 correct (51.80%)
Part IV: Keras Sequential API
Keras Sequential API: Two-Layer Network
learning_rate = 1e-2
def model_init_fn(inputs, is_training):
input_shape = (32, 32, 3)
hidden_layer_size, num_classes = 4000, 10
initializer = tf.variance_scaling_initializer(scale=2.0)
layers = [ #需要在第一层给出input_shape
tf.layers.Flatten(input_shape=input_shape),
tf.layers.Dense(hidden_layer_size, activation=tf.nn.relu,
kernel_initializer=initializer),
tf.layers.Dense(num_classes, kernel_initializer=initializer),
]
model = tf.keras.Sequential(layers)
return model(inputs)
def optimizer_init_fn():
return tf.train.GradientDescentOptimizer(learning_rate)
train_part34(model_init_fn, optimizer_init_fn)
Starting epoch 0
Iteration 0, loss = 3.0599
Got 138 / 1000 correct (13.80%)
Iteration 100, loss = 1.9839
Got 363 / 1000 correct (36.30%)
Iteration 200, loss = 1.4431
Got 389 / 1000 correct (38.90%)
Iteration 300, loss = 1.8575
Got 375 / 1000 correct (37.50%)
Iteration 400, loss = 1.7719
Got 413 / 1000 correct (41.30%)
Iteration 500, loss = 1.7979
Got 438 / 1000 correct (43.80%)
Iteration 600, loss = 1.8587
Got 418 / 1000 correct (41.80%)
Iteration 700, loss = 1.9053
Got 442 / 1000 correct (44.20%)
Keras Sequential API: Three-Layer ConvNet
- Convolutional layer with 16 5x5 kernels, using zero padding of 2
- ReLU nonlinearity
- Convolutional layer with 32 3x3 kernels, using zero padding of 1
- ReLU nonlinearity
- Fully-connected layer giving class scores
def model_init_fn(inputs, is_training):
model = None
############################################################################
# TODO: Construct a three-layer ConvNet using tf.keras.Sequential. #
############################################################################
initializer = tf.variance_scaling_initializer(scale=2.0)
layers = [
tf.layers.Conv2D(input_shape = (32,32,3),filters = 16,kernel_size = [5,5],
strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
use_bias = True,kernel_initializer = initializer,
bias_initializer = initializer,name = 'conv1'),
tf.layers.Conv2D(filters = 32,kernel_size = [5,5],
strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
use_bias = True,kernel_initializer = initializer,
bias_initializer = initializer,name = 'conv2'),
tf.layers.Flatten(),
tf.layers.Dense(units = 10,use_bias = True,
kernel_initializer = initializer,bias_initializer = initializer,
name = 'fc')]
model = tf.keras.Sequential(layers)
############################################################################
# END OF YOUR CODE #
############################################################################
return model(inputs)
learning_rate = 5e-4
def optimizer_init_fn():
optimizer = None
############################################################################
# TODO: Complete the implementation of model_fn. #
############################################################################
optimizer = tf.train.MomentumOptimizer(learning_rate = learning_rate,momentum = 0.9,use_nesterov = True)
############################################################################
# END OF YOUR CODE #
############################################################################
return optimizer
train_part34(model_init_fn, optimizer_init_fn)
Starting epoch 0
Iteration 0, loss = 2.5582
Got 103 / 1000 correct (10.30%)
Iteration 100, loss = 1.5996
Got 403 / 1000 correct (40.30%)
Iteration 200, loss = 1.4355
Got 461 / 1000 correct (46.10%)
Iteration 300, loss = 1.5550
Got 493 / 1000 correct (49.30%)
Iteration 400, loss = 1.4755
Got 484 / 1000 correct (48.40%)
Iteration 500, loss = 1.5330
Got 505 / 1000 correct (50.50%)
Iteration 600, loss = 1.5811
Got 523 / 1000 correct (52.30%)
Iteration 700, loss = 1.3541
Got 529 / 1000 correct (52.90%)
Part V: CIFAR-10 open-ended challenge
def model_init_fn(inputs, is_training):
model = None
############################################################################
# TODO: Construct a model that performs well on CIFAR-10 #
############################################################################
initializer = tf.variance_scaling_initializer(scale=2.0)
layers = [
tf.layers.Conv2D(input_shape = (32,32,3),filters = 64,kernel_size = [3,3],
strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
use_bias = True,kernel_initializer = initializer,
bias_initializer = initializer,name = 'conv1'),
tf.layers.Conv2D(filters = 64,kernel_size = [3,3],
strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
use_bias = True,kernel_initializer = initializer,
bias_initializer = initializer,name = 'conv2'),
tf.layers.Conv2D(filters = 128,kernel_size = [3,3],
strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
use_bias = True,kernel_initializer = initializer,
bias_initializer = initializer,name = 'conv3'),
tf.layers.MaxPooling2D(pool_size = [2,2],strides = [2,2],name = 'pool1'),
tf.layers.Conv2D(filters = 128,kernel_size = [3,3],
strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
use_bias = True,kernel_initializer = initializer,
bias_initializer = initializer,name = 'conv4'),
tf.layers.Conv2D(filters = 256,kernel_size = [3,3],
strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
use_bias = True,kernel_initializer = initializer,
bias_initializer = initializer,name = 'conv5'),
tf.layers.Conv2D(filters = 256,kernel_size = [3,3],
strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
use_bias = True,kernel_initializer = initializer,
bias_initializer = initializer,name = 'conv6'),
tf.layers.MaxPooling2D(pool_size = [2,2],strides = [2,2],name = 'pool2'),
tf.layers.Conv2D(filters = 256,kernel_size = [3,3],
strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
use_bias = True,kernel_initializer = initializer,
bias_initializer = initializer,name = 'conv7'),
tf.layers.Conv2D(filters = 256,kernel_size = [3,3],
strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
use_bias = True,kernel_initializer = initializer,
bias_initializer = initializer,name = 'conv8'),
tf.layers.Conv2D(filters = 256,kernel_size = [3,3],
strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
use_bias = True,kernel_initializer = initializer,
bias_initializer = initializer,name = 'conv9'),
tf.layers.MaxPooling2D(pool_size = [2,2],strides = [2,2],name = 'pool3'),
tf.layers.Conv2D(filters = 256,kernel_size = [3,3],
strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
use_bias = True,kernel_initializer = initializer,
bias_initializer = initializer,name = 'conv10'),
tf.layers.Conv2D(filters = 256,kernel_size = [3,3],
strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
use_bias = True,kernel_initializer = initializer,
bias_initializer = initializer,name = 'conv11'),
tf.layers.Conv2D(filters = 256,kernel_size = [3,3],
strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
use_bias = True,kernel_initializer = initializer,
bias_initializer = initializer,name = 'conv12'),
tf.layers.Conv2D(filters = 256,kernel_size = [3,3],
strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
use_bias = True,kernel_initializer = initializer,
bias_initializer = initializer,name = 'conv13'),
tf.layers.Flatten(),
tf.layers.Dense(units = 1024,use_bias = True,
kernel_initializer = initializer,bias_initializer = initializer,
name = 'fc1'),
tf.layers.Dense(units = 1024,use_bias = True,
kernel_initializer = initializer,bias_initializer = initializer,
name = 'fc2'),
tf.layers.Dense(units = 10,use_bias = True,
kernel_initializer = initializer,bias_initializer = initializer,
name = 'fc3')
]
model = tf.keras.Sequential(layers)
############################################################################
# END OF YOUR CODE #
############################################################################
return model(inputs)
def optimizer_init_fn():
optimizer = None
############################################################################
# TODO: Construct an optimizer that performs well on CIFAR-10 #
############################################################################
optimizer = tf.train.AdamOptimizer()
############################################################################
# END OF YOUR CODE #
############################################################################
return optimizer
device = '/gpu:0'
print_every = 700
num_epochs = 10
train_part34(model_init_fn, optimizer_init_fn, num_epochs)
Starting epoch 0
Iteration 0, loss = 3.8694
Got 79 / 1000 correct (7.90%)
Iteration 700, loss = 1.6052
Got 484 / 1000 correct (48.40%)
Starting epoch 1
Iteration 1400, loss = 1.0688
Got 616 / 1000 correct (61.60%)
Starting epoch 2
Iteration 2100, loss = 0.9978
Got 643 / 1000 correct (64.30%)
Starting epoch 3
Iteration 2800, loss = 0.8107
Got 678 / 1000 correct (67.80%)
Starting epoch 4
Iteration 3500, loss = 0.6718
Got 717 / 1000 correct (71.70%)
Starting epoch 5
Iteration 4200, loss = 0.3733
Got 750 / 1000 correct (75.00%)
Starting epoch 6
Iteration 4900, loss = 0.8152
Got 697 / 1000 correct (69.70%)
Starting epoch 7
Iteration 5600, loss = 0.3667
Got 704 / 1000 correct (70.40%)
Starting epoch 8
Iteration 6300, loss = 0.4429
Got 753 / 1000 correct (75.30%)
Starting epoch 9
Iteration 7000, loss = 0.4751
Got 761 / 1000 correct (76.10%)
16层的一个模型,包括13个卷积层,3个池化层,3个全连接层,使用adam来训练。
最终10个epoch准确率76.10%,还有很大的进步空间。