TensorFlow 验证码识别
以下资料来源于极客时间学习资料
• 准备模型开发环境
第三方依赖包

Pillow (PIL Fork)
PIL(Python Imaging Library) 为 Python 解释器添加了图像处理功能。但是,在 2009 年发布
1.1.7 版本后,社区便停止更新和维护。
Pillow 是由 Alex Clark 及社区贡献者 一起开发和维护的一款分叉自 PIL 的图像工具库。
至今,社区依然非常活跃,Pillow 仍在快速迭代。
Pillow提供广泛的文件格式支持,高效的内部表示和相当强大的图像处理功能。
核心图像库旨在快速访问以几种基本像素格式存储的数据, 它应该为一般的图像处理工
具提供坚实的基础。
captcha
Catpcha 是一个生成图像和音频验证码的开源工具库。
from captcha.image import ImageCaptcha from captcha.audio import AudioCaptcha
image = ImageCaptcha(fonts=['/path/A.ttf', '/path/B.ttf’]) data = image.generate('1234’) image.write('1234', 'out.png’)
audio = AudioCaptcha(voicedir='/path/to/voices’) data = audio.generate('1234’) audio.write('1234', 'out.wav’)
pydot
pydot 是用纯 Python 实现的 GraphViz 接口,支持使用 GraphViz 解析和存储 DOT语言
(graph description language)。其主要依赖 pyparsing 和 GraphViz 这两个工具库。
pyparsing:仅用于加载DOT文件,在 pydot 安装期间自动安装。
GraphViz:将图形渲染为PDF,PNG,SVG等格式文件,需独立安装。
flask
flask 是一个基于 Werkzeug 和 jinja2 开发的 Python Web 应用程序框架,遵从 BSD 开源协
议。它以一种简约的方式实现了框架核心,又保留了扩展性。
• 生成验证码数据集
验证码(CAPTCHA)简介
全自动区分计算机和人类的公开图灵测试(英语:Completely Automated Public Turing test
to tell Computers and Humans Apart,简称CAPTCHA),俗称验证码,是一种区分用户是
计算机或人的公共全自动程序。在CAPTCHA测试中,作为服务器的计算机会自动生成一
个问题由用户来解答。这个问题可以由计算机生成并评判,但是必须只有人类才能解答。
由于计算机无法解答CAPTCHA的问题,所以回答出问题的用户就可以被认为是人类。
一种常用的CAPTCHA测试是让用户输入一个扭曲变形的图片上所显示的文字或数字,扭
曲变形是为了避免被光学字符识别(OCR, Optical Character Recognition)之类的计算机程
序自动识别出图片上的文数字而失去效果。由于这个测试是由计算机来考人类,而不是
标准图灵测试中那样由人类来考计算机,人们有时称CAPTCHA是一种反向图灵测试。
验证码(CAPTCHA)破解
一些曾经或者正在使用中的验证码系统已被破解。
这包括Yahoo验证码的一个早期版本 EZ-Gimpy,PayPal使用的验证码,LiveJournal、
phpBB使用的验证码,很多金融机构(主要是银行)使用的网银验证码以及很多其他网站
使用的验证码。
俄罗斯的一个黑客组织使用一个自动识别软件在2006年破解了Yahoo的CAPTCHA。准确
率大概是15%,但是攻击者可以每天尝试10万次,相对来说成本很低。而在2008年,
Google的CAPTCHA也被俄罗斯黑客所破解。攻击者使用两台不同的计算机来调整破解进
程,可能是用第二台计算机学习第一台对CAPTCHA的破解,或者是对成效进行监视。
验证码(CAPTCHA)演进

验证码(CAPTCHA)生成
使用 Pillow(PIL Fork) 和 captcha 库生成验证码图像:
PIL.Image.open(fp, mode=‘r’) - 打开和识别输入的图像(文件)
captcha.image.ImageCaptcha(width, height,) – 创建 ImageCaptcha 实例
captcha.image.ImageCaptcha.write(‘1234’, ‘out.png’) – 生成验证码并保存
captcha.image.ImageCaptcha.generate(‘1234’) – 生成验证码图像
代码实现:
创建验证码数据集 引入第三方包 from captcha.image import ImageCaptcha import random import numpy as np import tensorflow.gfile as gfile import matplotlib.pyplot as plt import PIL.Image as Image 定义常量和字符集 NUMBER = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] LOWERCASE = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'] UPPERCASE = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'] CAPTCHA_CHARSET = NUMBER # 验证码字符集 CAPTCHA_LEN = 4 # 验证码长度 CAPTCHA_HEIGHT = 60 # 验证码高度 CAPTCHA_WIDTH = 160 # 验证码宽度 TRAIN_DATASET_SIZE = 5000 # 验证码数据集大小 TEST_DATASET_SIZE = 1000 TRAIN_DATA_DIR = './train-data/' # 验证码数据集目录 TEST_DATA_DIR = './test-data/' 生成随机字符的方法 def gen_random_text(charset=CAPTCHA_CHARSET, length=CAPTCHA_LEN): text = [random.choice(charset) for _ in range(length)] return ''.join(text) 创建并保存验证码数据集的方法 def create_captcha_dataset(size=100, data_dir='./data/', height=60, width=160, image_format='.png'): # 如果保存验证码图像,先清空 data_dir 目录 if gfile.Exists(data_dir): gfile.DeleteRecursively(data_dir) gfile.MakeDirs(data_dir) # 创建 ImageCaptcha 实例 captcha captcha = ImageCaptcha(width=width, height=height) for _ in range(size): # 生成随机的验证码字符 text = gen_random_text(CAPTCHA_CHARSET, CAPTCHA_LEN) captcha.write(text, data_dir + text + image_format) return None 创建并保存训练集 create_captcha_dataset(TRAIN_DATASET_SIZE, TRAIN_DATA_DIR) 创建并保存测试集 create_captcha_dataset(TEST_DATASET_SIZE, TEST_DATA_DIR) 生成并返回验证码数据集的方法 def gen_captcha_dataset(size=100, height=60, width=160, image_format='.png'): # 创建 ImageCaptcha 实例 captcha captcha = ImageCaptcha(width=width, height=height) # 创建图像和文本数组 images, texts = [None]*size, [None]*size for i in range(size): # 生成随机的验证码字符 texts[i] = gen_random_text(CAPTCHA_CHARSET, CAPTCHA_LEN) # 使用 PIL.Image.open() 识别新生成的验证码图像 # 然后,将图像转换为形如(CAPTCHA_WIDTH, CAPTCHA_HEIGHT, 3) 的 Numpy 数组 images[i] = np.array(Image.open(captcha.generate(texts[i]))) return images, texts 生成 100 张验证码图像和字符 images, texts = gen_captcha_dataset() plt.figure() for i in range(20): plt.subplot(5,4,i+1) # 绘制前20个验证码,以5行4列子图形式展示 plt.tight_layout() # 自动适配子图尺寸 plt.imshow(images[i]) plt.title("Label: {}".format(texts[i])) # 设置标签为子图标题 plt.xticks([]) # 删除x轴标记 plt.yticks([]) # 删除y轴标记 plt.show()

• 输入与输出数据处理
输入数据处理
图像处理:RGB图 -> 灰度图 -> 规范化数据

输入数据处理
适配 Keras 图像数据格式:“channels_frist” 或 “channels_last”

输出数据处理
One-hot 编码:验证码转向量

解码:模型输出向量转验证码
代码实现:
数据处理 引入第三方包 from PIL import Image from keras import backend as K import random import glob import numpy as np import tensorflow.gfile as gfile import matplotlib.pyplot as plt 定义超参数和字符集 NUMBER = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] LOWERCASE = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'] UPPERCASE = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'] CAPTCHA_CHARSET = NUMBER # 验证码字符集 CAPTCHA_LEN = 4 # 验证码长度 CAPTCHA_HEIGHT = 60 # 验证码高度 CAPTCHA_WIDTH = 160 # 验证码宽度 TRAIN_DATA_DIR = './train-data/\' # 验证码数据集目录 读取训练集前 100 张图片,并通过文件名解析验证码(标签) image = [] text = [] count = 0 for filename in glob.glob(TRAIN_DATA_DIR + '*.png'): image.append(np.array(Image.open(filename))) text.append(filename.lstrip(TRAIN_DATA_DIR).rstrip('.png')) count += 1 if count >= 100: break text[0] ''' '0005' ''' 数据可视化 plt.figure() for i in range(20): plt.subplot(5,4,i+1) # 绘制前20个验证码,以5行4列子图形式展示 plt.tight_layout() # 自动适配子图尺寸 plt.imshow(image[i]) plt.title("Label: {}".format(text[i])) # 设置标签为子图标题 plt.xticks([]) # 删除x轴标记 plt.yticks([]) # 删除y轴标记 plt.show()
image = np.array(image, dtype=np.float32) print(image.shape) ''' (100, 60, 160, 3) ''' 将 RGB 验证码图像转为灰度图 def rgb2gray(img): # Y' = 0.299 R + 0.587 G + 0.114 B # https://en.wikipedia.org/wiki/Grayscale#Converting_color_to_grayscale return np.dot(img[...,:3], [0.299, 0.587, 0.114]) image = rgb2gray(image) print(image.shape) ''' (100, 60, 160) ''' image[0] ''' array([[250.766, 250.766, 250.766, ..., 250.766, 250.766, 250.766], [250.766, 250.766, 250.766, ..., 250.766, 250.766, 250.766], [250.766, 250.766, 250.766, ..., 250.766, 250.766, 250.766], ..., [250.766, 250.766, 250.766, ..., 250.766, 250.766, 250.766], [250.766, 250.766, 250.766, ..., 250.766, 250.766, 250.766], [250.766, 250.766, 250.766, ..., 250.766, 250.766, 250.766]]) ''' plt.figure() for i in range(20): plt.subplot(5,4,i+1) # 绘制前20个验证码,以5行4列子图形式展示 plt.tight_layout() # 自动适配子图尺寸 plt.imshow(image[i], cmap='Greys') plt.title("Label: {}".format(text[i])) # 设置标签为子图标题 plt.xticks([]) # 删除x轴标记 plt.yticks([]) # 删除y轴标记 plt.show()
数据规范化 image = image / 255 image[0] ''' array([[0.98339608, 0.98339608, 0.98339608, ..., 0.98339608, 0.98339608, 0.98339608], [0.98339608, 0.98339608, 0.98339608, ..., 0.98339608, 0.98339608, 0.98339608], [0.98339608, 0.98339608, 0.98339608, ..., 0.98339608, 0.98339608, 0.98339608], ..., [0.98339608, 0.98339608, 0.98339608, ..., 0.98339608, 0.98339608, 0.98339608], [0.98339608, 0.98339608, 0.98339608, ..., 0.98339608, 0.98339608, 0.98339608], [0.98339608, 0.98339608, 0.98339608, ..., 0.98339608, 0.98339608, 0.98339608]]) ''' image.shape[0] ''' 100 ''' image.shape ''' (100, 60, 160) ''' 适配 Keras 图像数据格式 def fit_keras_channels(batch, rows=CAPTCHA_HEIGHT, cols=CAPTCHA_WIDTH): if K.image_data_format() == 'channels_first': batch = batch.reshape(batch.shape[0], 1, rows, cols) input_shape = (1, rows, cols) else: batch = batch.reshape(batch.shape[0], rows, cols, 1) input_shape = (rows, cols, 1) return batch, input_shape image, input_shape = fit_keras_channels(image) print(image.shape) print(input_shape) ''' (100, 60, 160, 1) (60, 160, 1) ''' type(image) ''' numpy.ndarray ''' 对验证码中每个字符进行 one-hot 编码 def text2vec(text, length=CAPTCHA_LEN, charset=CAPTCHA_CHARSET): text_len = len(text) # 验证码长度校验 if text_len != length: raise ValueError('Error: length of captcha should be {}, but got {}'.format(length, text_len)) # 生成一个形如(CAPTCHA_LEN*CAPTHA_CHARSET,) 的一维向量 # 例如,4个纯数字的验证码生成形如(4*10,)的一维向量 vec = np.zeros(length * len(charset)) for i in range(length): # One-hot 编码验证码中的每个数字 # 每个字符的热码 = 索引 + 偏移量 vec[charset.index(text[i]) + i*len(charset)] = 1 return vec text = list(text) vec = [None]*len(text) for i in range(len(vec)): vec[i] = text2vec(text[i]) vec[0] ''' array([1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.]) ''' text[0] ''' '0005' ''' 将验证码向量解码为对应字符 def vec2text(vector): if not isinstance(vector, np.ndarray): vector = np.asarray(vector) vector = np.reshape(vector, [CAPTCHA_LEN, -1]) text = '' for item in vector: text += CAPTCHA_CHARSET[np.argmax(item)] return text # 模型对 ‘3935’ 验证码推理的输出值 yy_vec = np.array([[2.0792404e-10, 4.3756086e-07, 3.1140310e-10, 9.9823320e-01, 5.1135743e-15, 3.7417038e-05, 1.0556480e-08, 9.0933657e-13, 2.7573466e-07, 1.7286760e-03, 1.1030550e-07, 1.1852034e-07, 7.9457263e-10, 3.4533365e-09, 6.6065012e-14, 2.8996323e-05, 7.6345885e-13, 3.1817032e-16, 3.9540555e-05, 9.9993122e-01, 5.3814397e-13, 1.2061575e-10, 1.6408040e-03, 9.9833637e-01, 6.5149628e-08, 5.2246549e-12, 1.1365444e-08, 9.5700288e-12, 2.2725430e-05, 5.2195204e-10, 3.2457771e-13, 2.1413280e-07, 7.3547295e-14, 4.4094882e-06, 3.8390007e-07, 9.9230206e-01, 6.4467136e-03, 3.9224533e-11, 1.2461344e-03, 1.1253484e-07]], dtype=np.float32) yy = vec2text(yy_vec) yy ''' '3935' ''' img = rgb2gray(np.array(Image.open('3935.png'))) plt.figure() plt.imshow(img, cmap='Greys') plt.title("Label: {}".format(yy)) # 设置标签为图标题 plt.xticks([]) # 删除x轴标记 plt.yticks([]) # 删除y轴标记 plt.show()

• 模型结构设计
分类问题
图像分类模型 AlexNet
使用卷积进行特征提取
图像分类模型 VGG-16
验证码识别模型结构
验证码识别模型实现
• 模型损失函数设计
交叉熵(Cross-Entropy, CE)
我们使用交叉熵作为该模型的损失函数。
虽然 Categorical / Binary CE 是更常用的损失函数,不过他们都是 CE 的变体。
CE 定义如下:
对于二分类问题 (C‘=2) ,CE 定义如下:
Categorical CE Loss(Softmax Loss)
常用于输出为 One-hot 向量的多类别分类(Multi-Class Classification)模型。
Binary CE Loss(Sigmoid CE Loss)
与 Softmax Loss 不同,Binary CE Loss 对于每个向量分量(class)都是独立
的,这意味着每个向量分量计算的损失不受其他分量的影响。
因此,它常被用于多标签分类(Multi-label classification)模型。
实现代码:
训练模型 引入第三方包 from PIL import Image from keras import backend as K from keras.utils import plot_model from keras.models import * from keras.layers import * import glob import pickle import numpy as np import tensorflow.gfile as gfile import matplotlib.pyplot as plt 定义超参数和字符集 NUMBER = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] LOWERCASE = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'] UPPERCASE = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'] CAPTCHA_CHARSET = NUMBER # 验证码字符集 CAPTCHA_LEN = 4 # 验证码长度 CAPTCHA_HEIGHT = 60 # 验证码高度 CAPTCHA_WIDTH = 160 # 验证码宽度 TRAIN_DATA_DIR = './train-data/\' # 验证码数据集目录 TEST_DATA_DIR = './test-data/\' BATCH_SIZE = 100 # 每一批训练的个数 EPOCHS = 20 # 训练集训练的轮数 OPT = 'adam' # 优化器 LOSS = 'binary_crossentropy' # 模型损失函数 # 训练后的文件保存 MODEL_DIR = './model/train_demo/' MODEL_FORMAT = '.h5' # 保存格式 HISTORY_DIR = './history/train_demo/' # 用于保存训练记录 HISTORY_FORMAT = '.history' # 输出文件名格式 filename_str = "{}captcha_{}_{}_bs_{}_epochs_{}{}" # 模型网络结构文件 MODEL_VIS_FILE = 'captcha_classfication' + '.png' # 模型文件 MODEL_FILE = filename_str.format(MODEL_DIR, OPT, LOSS, str(BATCH_SIZE), str(EPOCHS), MODEL_FORMAT) # 训练记录文件 HISTORY_FILE = filename_str.format(HISTORY_DIR, OPT, LOSS, str(BATCH_SIZE), str(EPOCHS), HISTORY_FORMAT) 将 RGB 验证码图像转为灰度图 def rgb2gray(img): # Y' = 0.299 R + 0.587 G + 0.114 B # https://en.wikipedia.org/wiki/Grayscale#Converting_color_to_grayscale return np.dot(img[...,:3], [0.299, 0.587, 0.114]) 对验证码中每个字符进行 one-hot 编码 def text2vec(text, length=CAPTCHA_LEN, charset=CAPTCHA_CHARSET): text_len = len(text) # 验证码长度校验 if text_len != length: raise ValueError('Error: length of captcha should be {}, but got {}'.format(length, text_len)) # 生成一个形如(CAPTCHA_LEN*CAPTHA_CHARSET,) 的一维向量 # 例如,4个纯数字的验证码生成形如(4*10,)的一维向量 vec = np.zeros(length * len(charset)) for i in range(length): # One-hot 编码验证码中的每个数字 # 每个字符的热码 = 索引 + 偏移量 vec[charset.index(text[i]) + i*len(charset)] = 1 return vec 将验证码向量解码为对应字符 def vec2text(vector): if not isinstance(vector, np.ndarray): vector = np.asarray(vector) vector = np.reshape(vector, [CAPTCHA_LEN, -1]) text = '' for item in vector: text += CAPTCHA_CHARSET[np.argmax(item)] return text 适配 Keras 图像数据格式 def fit_keras_channels(batch, rows=CAPTCHA_HEIGHT, cols=CAPTCHA_WIDTH): if K.image_data_format() == 'channels_first': batch = batch.reshape(batch.shape[0], 1, rows, cols) input_shape = (1, rows, cols) else: batch = batch.reshape(batch.shape[0], rows, cols, 1) input_shape = (rows, cols, 1) return batch, input_shape 读取训练集 X_train = [] Y_train = [] for filename in glob.glob(TRAIN_DATA_DIR + '*.png'): X_train.append(np.array(Image.open(filename))) Y_train.append(filename.lstrip(TRAIN_DATA_DIR).rstrip('.png')) X_train[0][1][1] ''' array([253, 249, 254], dtype=uint8) ''' Y_train[0] ''' '0005' ''' 处理训练集图像 # list -> rgb(numpy) X_train = np.array(X_train, dtype=np.float32) # rgb -> gray X_train = rgb2gray(X_train) # normalize X_train = X_train / 255 # Fit keras channels X_train, input_shape = fit_keras_channels(X_train) print(X_train.shape, type(X_train)) print(input_shape) ''' (3919, 60, 160, 1) <class 'numpy.ndarray'> (60, 160, 1) ''' 处理训练集标签 Y_train = list(Y_train) for i in range(len(Y_train)): Y_train[i] = text2vec(Y_train[i]) Y_train = np.asarray(Y_train) print(Y_train.shape, type(Y_train)) ''' (3919, 40) <class 'numpy.ndarray'> ''' 读取测试集,处理对应图像和标签 X_test = [] Y_test = [] for filename in glob.glob(TEST_DATA_DIR + '*.png'): X_test.append(np.array(Image.open(filename))) Y_test.append(filename.lstrip(TEST_DATA_DIR).rstrip('.png')) # list -> rgb -> gray -> normalization -> fit keras X_test = np.array(X_test, dtype=np.float32) X_test = rgb2gray(X_test) X_test = X_test / 255 X_test, _ = fit_keras_channels(X_test) Y_test = list(Y_test) for i in range(len(Y_test)): Y_test[i] = text2vec(Y_test[i]) Y_test = np.asarray(Y_test) print(X_test.shape, type(X_test)) print(Y_test.shape, type(Y_test)) ''' (958, 60, 160, 1) <class 'numpy.ndarray'> (958, 40) <class 'numpy.ndarray'> ''' 创建验证码识别模型 # 输入层 inputs = Input(shape = input_shape, name = "inputs") # 第1层卷积 conv1 = Conv2D(32, (3, 3), name = "conv1")(inputs) relu1 = Activation('relu', name="relu1")(conv1) # 第2层卷积 conv2 = Conv2D(32, (3, 3), name = "conv2")(relu1) relu2 = Activation('relu', name="relu2")(conv2) pool2 = MaxPooling2D(pool_size=(2,2), padding='same', name="pool2")(relu2) # 第3层卷积 conv3 = Conv2D(64, (3, 3), name = "conv3")(pool2) relu3 = Activation('relu', name="relu3")(conv3) pool3 = MaxPooling2D(pool_size=(2,2), padding='same', name="pool3")(relu3) # 将 Pooled feature map 摊平后输入全连接网络 x = Flatten()(pool3) # Dropout x = Dropout(0.25)(x) # 4个全连接层分别做10分类,分别对应4个字符。 x = [Dense(10, activation='softmax', name='fc%d'%(i+1))(x) for i in range(4)] # 4个字符向量拼接在一起,与标签向量形式一致,作为模型输出。 outs = Concatenate()(x) # 定义模型的输入与输出 model = Model(inputs=inputs, outputs=outs) model.compile(optimizer=OPT, loss=LOSS, metrics=['accuracy']) 查看模型摘要 model.summary() # 输出模型摘要信息 ''' __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== inputs (InputLayer) (None, 60, 160, 1) 0 __________________________________________________________________________________________________ conv1 (Conv2D) (None, 58, 158, 32) 320 inputs[0][0] __________________________________________________________________________________________________ relu1 (Activation) (None, 58, 158, 32) 0 conv1[0][0] __________________________________________________________________________________________________ conv2 (Conv2D) (None, 56, 156, 32) 9248 relu1[0][0] __________________________________________________________________________________________________ relu2 (Activation) (None, 56, 156, 32) 0 conv2[0][0] __________________________________________________________________________________________________ pool2 (MaxPooling2D) (None, 28, 78, 32) 0 relu2[0][0] __________________________________________________________________________________________________ conv3 (Conv2D) (None, 26, 76, 64) 18496 pool2[0][0] __________________________________________________________________________________________________ relu3 (Activation) (None, 26, 76, 64) 0 conv3[0][0] __________________________________________________________________________________________________ pool3 (MaxPooling2D) (None, 13, 38, 64) 0 relu3[0][0] __________________________________________________________________________________________________ flatten_1 (Flatten) (None, 31616) 0 pool3[0][0] __________________________________________________________________________________________________ dropout_1 (Dropout) (None, 31616) 0 flatten_1[0][0] __________________________________________________________________________________________________ fc1 (Dense) (None, 10) 316170 dropout_1[0][0] __________________________________________________________________________________________________ fc2 (Dense) (None, 10) 316170 dropout_1[0][0] __________________________________________________________________________________________________ fc3 (Dense) (None, 10) 316170 dropout_1[0][0] __________________________________________________________________________________________________ fc4 (Dense) (None, 10) 316170 dropout_1[0][0] __________________________________________________________________________________________________ concatenate_1 (Concatenate) (None, 40) 0 fc1[0][0] fc2[0][0] fc3[0][0] fc4[0][0] ================================================================================================== Total params: 1,292,744 Trainable params: 1,292,744 Non-trainable params: 0 __________________________________________________________________________________________________ ''' 模型可视化 #import os #os.environ["PATH"] += os.pathsep + 'D:Program FilesPython37Graphviz2.38in' plot_model(model, to_file=MODEL_VIS_FILE, show_shapes=True, show_layer_names=True) 训练模型 history = model.fit(X_train, Y_train, batch_size=BATCH_SIZE, epochs=EPOCHS, verbose=2, validation_data=(X_test, Y_test)) ''' Train on 3919 samples, validate on 958 samples Epoch 1/10 - 45s - loss: 0.3256 - acc: 0.9000 - val_loss: 0.3249 - val_acc: 0.9000 Epoch 2/10 - 48s - loss: 0.3242 - acc: 0.9000 - val_loss: 0.3229 - val_acc: 0.9000 Epoch 3/10 - 47s - loss: 0.3075 - acc: 0.9001 - val_loss: 0.2962 - val_acc: 0.9007 Epoch 4/10 - 50s - loss: 0.2374 - acc: 0.9126 - val_loss: 0.2463 - val_acc: 0.9120 Epoch 5/10 - 51s - loss: 0.1729 - acc: 0.9367 - val_loss: 0.2253 - val_acc: 0.9190 Epoch 6/10 - 48s - loss: 0.1363 - acc: 0.9508 - val_loss: 0.2114 - val_acc: 0.9230 Epoch 7/10 - 48s - loss: 0.1136 - acc: 0.9589 - val_loss: 0.2175 - val_acc: 0.9236 Epoch 8/10 - 48s - loss: 0.0943 - acc: 0.9666 - val_loss: 0.2242 - val_acc: 0.9234 Epoch 9/10 - 48s - loss: 0.0825 - acc: 0.9702 - val_loss: 0.2185 - val_acc: 0.9241 Epoch 10/10 - 48s - loss: 0.0742 - acc: 0.9735 - val_loss: 0.2321 - val_acc: 0.9245 ''' 预测样例 print(vec2text(Y_test[3])) ''' 0030 ''' yy = model.predict(X_test[6].reshape(1, 60, 160, 1)) print(vec2text(yy)) ''' 0080 ''' 保存模型 if not gfile.Exists(MODEL_DIR): gfile.MakeDirs(MODEL_DIR) model.save(MODEL_FILE) print('Saved trained model at %s ' % MODEL_FILE) ''' Saved trained model at ./model/train_demo/captcha_adam_binary_crossentropy_bs_100_epochs_10.h5 ''' 保存训练过程记录 history.history['acc'] ''' [0.8999999165534973, 0.8999999165534973, 0.9001274999990607, 0.9125924786763339, 0.9367058057966328, 0.950835685807243, 0.9589372451556645, 0.9666305299953831, 0.970170951324569, 0.9735455595738245] ''' history.history['loss'] ''' [0.3255925034649307, 0.3241707802077403, 0.30746189193264445, 0.23740254261567295, 0.17286433247575106, 0.1362645993939344, 0.11359802067363466, 0.09430851856910565, 0.08249860131624981, 0.07421532272722199] ''' history.history.keys() ''' dict_keys(['val_loss', 'val_acc', 'loss', 'acc']) ''' if gfile.Exists(HISTORY_DIR) == False: gfile.MakeDirs(HISTORY_DIR) with open(HISTORY_FILE, 'wb') as f: pickle.dump(history.history, f) print(HISTORY_FILE) ''' ./history/train_demo/captcha_adam_binary_crossentropy_bs_100_epochs_10.history '''
• 模型训练过程分析
模型训练过程
学习率(Learning rate)
学习率与损失值变化(模型收敛速度)直接相关。
何时加大学习率
• 训练初期,损失值一直没什么波动
何时减小学习率
• 训练初期,损失值直接爆炸或者 NAN
• 损失值先开始速降,后平稳多时
• 训练后期,损失值反复上下波动

优化器介绍:SGD(Stochastic Gradient Descent)
优化器介绍:SGD-M(Momentum)
SGD 在遇到沟壑时容易陷入震荡。为此,可以为其引入动量(Momentum),加速 SGD
在正确方向的下降并抑制震荡。

优化器介绍:Adagrad – RMSprop – Adam

优化器对比:鞍点

优化器对比: 验证码识别模型

代码实现:
模型训练过程分析 引入第三方包 import glob import pickle import numpy as np import matplotlib.pyplot as plt 加载训练过程记录 history_file = './pre-trained/history/optimizer/binary_ce/captcha_adam_binary_crossentropy_bs_100_epochs_100.history' with open(history_file, 'rb') as f: history = pickle.load(f) 训练过程可视化 fig = plt.figure() plt.subplot(2,1,1) plt.plot(history['acc']) plt.plot(history['val_acc']) plt.title('Model Accuracy') plt.ylabel('accuracy') plt.xlabel('epoch') plt.legend(['train', 'test'], loc='lower right') plt.subplot(2,1,2) plt.plot(history['loss']) plt.plot(history['val_loss']) plt.title('Model Loss') plt.ylabel('loss') plt.xlabel('epoch') plt.legend(['train', 'test'], loc='upper right') plt.tight_layout() plt.show()
定义过程可视化方法 def plot_training(history=None, metric='acc', title='Model Accuracy', loc='lower right'): model_list = [] fig = plt.figure(figsize=(10, 8)) for key, val in history.items(): model_list.append(key.replace(HISTORY_DIR, '').rstrip('.history')) plt.plot(val[metric]) plt.title(title) plt.ylabel(metric) plt.xlabel('epoch') plt.legend(model_list, loc=loc) plt.show() 加载预训练模型记录 HISTORY_DIR = './pre-trained/history/optimizer/binary_ce/' history = {} for filename in glob.glob(HISTORY_DIR + '*.history'): with open(filename, 'rb') as f: history[filename] = pickle.load(f) for key, val in history.items(): print(key.replace(HISTORY_DIR, '').rstrip('.history'), val.keys()) ''' ./pre-trained/history/optimizer/binary_cecaptcha_adadelta_binary_crossentropy_bs_100_epochs_100 dict_keys(['val_loss', 'val_acc', 'loss', 'acc']) ./pre-trained/history/optimizer/binary_cecaptcha_adagrad_binary_crossentropy_bs_100_epochs_100 dict_keys(['val_loss', 'val_acc', 'loss', 'acc']) ./pre-trained/history/optimizer/binary_cecaptcha_adam_binary_crossentropy_bs_100_epochs_100 dict_keys(['val_loss', 'val_acc', 'loss', 'acc']) ./pre-trained/history/optimizer/binary_cecaptcha_rmsprop_binary_crossentropy_bs_100_epochs_100 dict_keys(['val_loss', 'val_acc', 'loss', 'acc']) ''' 准确率变化(训练集) plot_training(history)
损失值变化(训练集) plot_training(history, metric='loss', title='Model Loss', loc='upper right')
准确率变化(测试集) plot_training(history, metric='val_acc', title='Model Accuracy (val)')
损失值变化(测试集) plot_training(history, metric='val_loss', title='Model Loss (val)', loc='upper right')

• 模型部署与效果演示
数据-模型-服务流水线
使用 Flask 快速搭建 验证码识别服务
使用 Flask 启动 验证码识别服务
访问 验证码识别服务
app.py
import base64 import numpy as np import tensorflow as tf from io import BytesIO from flask import Flask, request, jsonify from keras.models import load_model from PIL import Image NUMBER = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] LOWERCASE = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'] UPPERCASE = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'] CAPTCHA_CHARSET = NUMBER # 验证码字符集 CAPTCHA_LEN = 4 # 验证码长度 CAPTCHA_HEIGHT = 60 # 验证码高度 CAPTCHA_WIDTH = 160 # 验证码宽度 # 10 个 Epochs 训练的模型 MODEL_FILE = './pre-trained/model/captcha_rmsprop_binary_crossentropy_bs_100_epochs_10.h5' def vec2text(vector): if not isinstance(vector, np.ndarray): vector = np.asarray(vector) vector = np.reshape(vector, [CAPTCHA_LEN, -1]) text = '' for item in vector: text += CAPTCHA_CHARSET[np.argmax(item)] return text def rgb2gray(img): # Y' = 0.299 R + 0.587 G + 0.114 B # https://en.wikipedia.org/wiki/Grayscale#Converting_color_to_grayscale return np.dot(img[...,:3], [0.299, 0.587, 0.114]) app = Flask(__name__) # 创建 Flask 实例 # 测试 URL @app.route('/ping', methods=['GET', 'POST']) def hello_world(): return 'pong' # 验证码识别 URL @app.route('/predict', methods=['POST']) def predict(): response = {'success': False, 'prediction': '', 'debug': 'error'} received_image= False if request.method == 'POST': if request.files.get('image'): # 图像文件 image = request.files['image'].read() received_image = True response['debug'] = 'get image' elif request.get_json(): # base64 编码的图像文件 encoded_image = request.get_json()['image'] image = base64.b64decode(encoded_image) received_image = True response['debug'] = 'get json' if received_image: image = np.array(Image.open(BytesIO(image))) image = rgb2gray(image).reshape(1, 60, 160, 1).astype('float32') / 255 with graph.as_default(): pred = model.predict(image) response['prediction'] = response['prediction'] + vec2text(pred) response['success'] = True response['debug'] = 'predicted' else: response['debug'] = 'No Post' return jsonify(response) model = load_model(MODEL_FILE) # 加载模型 graph = tf.get_default_graph() # 获取 TensorFlow 默认数据流图