下文使用TensorFlow实现了一个多层感知机和一个简单的卷积神经网络模型,并应用于数据集MNIST。
所有代码以及所使用的的数据集文件可以到作者的GitHub上下载,GitHub上提供的Jupyter Notebook文
件包含代码以及详细注释(代码中使用的每个函数的作用、参数说明)。
import tensorflow as tf
from tensorflow import keras print(tf.__version__)# 2.0.0
使用的TensorFlow版本为2.0.0
首先获取数据集:
from tensorflow.keras.datasets import mnist (train_data, train_label), (test_data, test_label) = mnist.load_data('./mnist.npz')
这里需要注意下载数据集时可能会出现HTTP连接超时的问题,可能需要VPN,也可以自行下载mnist.npz文件
再将数据放到C:UsersAdministrator.kerasdatasets文件夹下。
多层感知机的实现:
# 定义模型 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(256, activation='relu'), tf.keras.layers.Dense(10, activation='softmax') ])
模型结构:
print(model.summary()) """ Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= flatten (Flatten) (None, 784) 0 _________________________________________________________________ dense (Dense) (None, 256) 200960 _________________________________________________________________ dense_1 (Dense) (None, 10) 2570 ================================================================= Total params: 203,530 Trainable params: 203,530 Non-trainable params: 0 _________________________________________________________________ None """
对数据做归一化处理,并设定模型超参数,训练模型:
# 将输入数据归一化 train_data = train_data / 255.0 test_data = test_data / 255.0 model.compile(optimizer=tf.keras.optimizers.SGD(lr=0.5), loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(train_data, train_label, epochs=5, batch_size=256, validation_data=(test_data, test_label), validation_freq=1)
训练结果:
Train on 60000 samples, validate on 10000 samples Epoch 1/5 60000/60000 [==============================] - 16s 259us/sample - loss: 0.3641 - accuracy: 0.8926 - val_loss: 0.2121 - val_accuracy: 0.9351 Epoch 2/5 60000/60000 [==============================] - 4s 63us/sample - loss: 0.1652 - accuracy: 0.9523 - val_loss: 0.1375 - val_accuracy: 0.9580 Epoch 3/5 60000/60000 [==============================] - 4s 63us/sample - loss: 0.1199 - accuracy: 0.9658 - val_loss: 0.1091 - val_accuracy: 0.9674 Epoch 4/5 60000/60000 [==============================] - 5s 85us/sample - loss: 0.0952 - accuracy: 0.9726 - val_loss: 0.1082 - val_accuracy: 0.9658 Epoch 5/5 60000/60000 [==============================] - 4s 70us/sample - loss: 0.0788 - accuracy: 0.9775 - val_loss: 0.0947 - val_accuracy: 0.9702 <tensorflow.python.keras.callbacks.History at 0x23036b99320>
除此之外,作者通过给上述多层感知机模型添加全连接层以及改变全连接层的尺寸,并观察了这些操作对训练结果
的影响。由于此文的目的是为了提供一个多层感知机的实现示例,因此不再展开,具体代码以及实验结果可以在作
者GitHub上看到。
简易CNN实现:
model5 = tf.keras.models.Sequential([ tf.keras.layers.Conv2D(filters=6, kernel_size=5, activation='relu', input_shape=(28, 28, 1)), tf.keras.layers.MaxPool2D(pool_size=2, strides=2), tf.keras.layers.Flatten(), tf.keras.layers.Dense(256, activation='relu'), tf.keras.layers.Dense(10, activation='softmax') ])
模型结构:
Model: "sequential_7" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_4 (Conv2D) (None, 24, 24, 6) 156 _________________________________________________________________ max_pooling2d_4 (MaxPooling2 (None, 12, 12, 6) 0 _________________________________________________________________ flatten_4 (Flatten) (None, 864) 0 _________________________________________________________________ dense_16 (Dense) (None, 256) 221440 _________________________________________________________________ dense_17 (Dense) (None, 10) 2570 ================================================================= Total params: 224,166 Trainable params: 224,166 Non-trainable params: 0 _________________________________________________________________ None
在训练模型前需要将训练数据的shape更改一下:
train_data = tf.reshape(train_data, (-1, 28, 28, 1))
test_data = tf.reshape(test_data, (-1, 28, 28, 1))
设置超参数并训练模型:
model5.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), loss='sparse_categorical_crossentropy', metrics=['accuracy']) model5.fit(train_data, train_label, epochs=5, validation_split=0.1)
训练结果:
Train on 54000 samples, validate on 6000 samples Epoch 1/5 54000/54000 [==============================] - 34s 622us/sample - loss: 0.2047 - accuracy: 0.9400 - val_loss: 0.0763 - val_accuracy: 0.9797 Epoch 2/5 54000/54000 [==============================] - 32s 594us/sample - loss: 0.0688 - accuracy: 0.9792 - val_loss: 0.0605 - val_accuracy: 0.9833 Epoch 3/5 54000/54000 [==============================] - 32s 600us/sample - loss: 0.0479 - accuracy: 0.9846 - val_loss: 0.0476 - val_accuracy: 0.9870 Epoch 4/5 54000/54000 [==============================] - 32s 593us/sample - loss: 0.0338 - accuracy: 0.9892 - val_loss: 0.0566 - val_accuracy: 0.9855 Epoch 5/5 54000/54000 [==============================] - 35s 649us/sample - loss: 0.0258 - accuracy: 0.9916 - val_loss: 0.0522 - val_accuracy: 0.9858 <tensorflow.python.keras.callbacks.History at 0x230380d9518>
在训练model5之前,作者使用了同样结构的model4,但是优化器选用的SGD,学习率设为0.9,在最后训练完成后
发现整个模型的准确率=0.1就像没有训练过的随机初始化的模型一样,因此作者将模型的优化器修改为Adam,并将
学习率设为0.001,即model4,训练完成后模型准确率为0.98。之后作者仅修改学习率,优化器仍然使用SGD,发现
模型训练完成后的准确率虽然没有优化器寻味Adam的版本好,但也有0.96。可以看出模型的超参数选取十分重要。