由于公司需要进行了中文验证码的图片识别开发,最近一段时间刚忙完上线,好不容易闲下来就继上篇《基于Windows10 x64+visual Studio2013+Python2.7.12环境下的Caffe配置学习 》文章,记录下利用caffe进行中文验证码图片识别的开发过程。由于这里主要介绍开发和实现过程,CNN理论性的东西这里不作为介绍的重点,遇到相关的概念和术语请自行研究。目前从我们训练出来的模型来看,单字识别率接近96%,所以一个四字验证码的准确率大概80%,效果还不错,完全能满足使用,如果每张图片的样本继续加大应该能取得更高的准确率,当然随着样本的加大,训练时间也随之增大,对硬件设备要求也越高,还有就是优化LeNet网络结构,目前这里只使用了三层卷积。
软件环境:visual Studio2013+Python2.7.12+caffe
硬件环境:Intel Core i7-4790+GTX1080+RAM32G
针对不同类型的验证码需要分别处理,这些处理过程统称图片预处理,目前并没有统一的预处理方式,需要针对不同的验证码做特殊处理,但是大体过程无外乎:灰度化、二值化、去干扰线、分割切图、标准化,这些过程用python实现都非常的简单,这里就不详细介绍了,直接上代码,需要import cv2:

1 class PreProcess(object): 2 """description of class""" 3 def ConvertToGray(self,Image,filename): 4 GrayImage=cv2.cvtColor(Image,cv2.COLOR_BGR2GRAY) 5 return GrayImage 6 7 def ConvertTo1Bpp(self,GrayImage,filename): 8 Bpp=cv2.threshold(GrayImage,127,255,cv2.THRESH_BINARY) 9 cv2.imwrite('D://'+'1.jpg',Bpp[1]) 10 return Bpp 11 12 def InterferLine(self,Bpp,filename): 13 for i in range(0,76): 14 for j in range(0,Bpp.shape[0]): 15 Bpp[j][i]=255 16 for i in range(161,Bpp.shape[1]): 17 for j in range(0,Bpp.shape[0]): 18 Bpp[j][i]=255 19 m=1 20 n=1 21 for i in range(76,161): 22 while(m<Bpp.shape[0]-1): 23 if Bpp[m][i]==0: 24 if Bpp[m+1][i]==0: 25 n=m+1 26 elif m>0 and Bpp[m-1][i]==0: 27 n=m 28 m=n-1 29 else: 30 n=m+1 31 break 32 elif m!=Bpp.shape[0]: 33 l=0 34 k=0 35 ll=m 36 kk=m 37 while(ll>0): 38 if Bpp[ll][i]==0: 39 ll=11-1 40 l=l+1 41 else: 42 break 43 while(kk>0): 44 if Bpp[kk][i]==0: 45 kk=kk-1 46 k=k+1 47 else: 48 break 49 if (l<=k and l!=0) or (k==0 and l!=0): 50 m=m-1 51 else: 52 m=m+1 53 else: 54 break 55 #endif 56 #endwhile 57 if m>0 and Bpp[m-1][i]==0 and Bpp[n-1][i]==0: 58 continue 59 else: 60 Bpp[m][i]=255 61 Bpp[n][i]=255 62 #endif 63 #endfor 64 return Bpp 65 66 def CutImage(self,Bpp,filename): 67 b1=np.zeros((Bpp.shape[0],20)) 68 for i in range(78,98): 69 for j in range(0,Bpp.shape[0]): 70 b1[j][i-78]=Bpp[j][i] 71 cv2.imwrite(outpath+filename.decode('gbk')[0].encode('gbk')+'_'+'%d' %(time.time()*1000)+str(random.randint(1000,9999))+'.png',b1) 72 73 b2=np.zeros((Bpp.shape[0],19)) 74 for i in range(99,118): 75 for j in range(0,Bpp.shape[0]): 76 b2[j][i-99]=Bpp[j][i] 77 cv2.imwrite(outpath+filename.decode('gbk')[1].encode('gbk')+'_'+'%d' %(time.time()*1000)+str(random.randint(1000,9999))+'.png',b2) 78 79 b3=np.zeros((Bpp.shape[0],19)) 80 for i in range(119,138): 81 for j in range(0,Bpp.shape[0]): 82 b3[j][i-119]=Bpp[j][i] 83 cv2.imwrite(outpath+filename.decode('gbk')[2].encode('gbk')+'_'+'%d' %(time.time()*1000)+str(random.randint(1000,9999))+'.png',b3) 84 85 b4=np.zeros((Bpp.shape[0],19)) 86 for i in range(139,158): 87 for j in range(0,Bpp.shape[0]): 88 b4[j][i-139]=Bpp[j][i] 89 cv2.imwrite(outpath+filename.decode('gbk')[3].encode('gbk')+'_'+'%d' %(time.time()*1000)+str(random.randint(1000,9999))+'.png',b4) 90 #return (b1,b2,b3,b4)

1 import cv2 2 PP=PreProcess() 3 for root,dirs,files in os.walk(inpath): 4 for filename in files: 5 Img=cv2.imread(root+'/'+filename)#太坑,此处inpath不能包含中文路径 6 GrayImage=PP.ConvertToGray(Img,filename) 7 Bpp=PP.ConvertTo1Bpp(GrayImage,filename) 8 Bpp_new=PP.InterferLine(Bpp,filename) 9 b=PP.CutImage(Bpp_new,filename)

path=os.getcwd()#保存当前路径 os.chdir("./caffe-master/caffe-master/Build/x64/Debug")#改变路径到caffe.exe文件夹 os.system('SET GLOG_logtostderr=1') #生成训练集 os.system('convert_imageset.exe --shuffle ./caffe-master/caffe-master/windows/CaptchaTest/dpsample/data/train ./caffe-master/caffe-master/windows/CaptchaTest/dpsample/data/train.txt ./caffe-master/caffe-master/windows/CaptchaTest/dpsample/data/trainldb 0') #生成测试集 os.system('convert_imageset.exe --shuffle ./caffe-master/caffe-master/windows/CaptchaTest/dpsample/data/val ./caffe-master/caffe-master/windows/CaptchaTest/dpsample/data/val.txt ./caffe-master/caffe-master/windows/CaptchaTest/dpsample/data/testldb 0')

name: "LeNet" layer { name: "mnist" type: "Data" top: "data" top: "label" include { phase: TRAIN } transform_param { scale: 0.00390625 } data_param { source: "E:/work/meb/Deeplearning/caffe-master/caffe-master/windows/CaptchaTest/dpsample/data/trainldb" batch_size: 64 backend: LMDB } } layer { name: "mnist" type: "Data" top: "data" top: "label" include { phase: TEST } transform_param { scale: 0.00390625 } data_param { source: "E:/work/meb/Deeplearning/caffe-master/caffe-master/windows/CaptchaTest/dpsample/data/testldb" batch_size: 100 backend: LMDB } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 64 kernel_size: 7 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "relu1" type: "ReLU" bottom: "conv1" top: "conv1" } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 256 pad:1 kernel_size: 6 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2" } layer { name: "conv3" type: "Convolution" bottom: "conv2" top: "conv3" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 1024 pad:1 kernel_size: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "relu3" type: "ReLU" bottom: "conv3" top: "conv3" } layer { name: "pool2" type: "Pooling" bottom: "conv3" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "ip1" type: "InnerProduct" bottom: "pool2" top: "ip1" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 3666 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "relu4" type: "ReLU" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 3666 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "accuracy" type: "Accuracy" bottom: "ip2" bottom: "label" top: "accuracy" include { phase: TEST } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "ip2" bottom: "label" top: "loss" }
到目前为止,准备工作都做完了,现在就可以利用python import caffe进行模型训练了,模型训练速度快慢主要看你GPU的配置如何,我开始用的GTX650,训练5000轮下来,就得消耗半天时间,实在无法忍受这个速度,就向公司申请买了一个GTX1080,那速度简直没法比,训练5000轮半个小时就能完成。调用模型的代码如下:
cmd='caffe.exe train -solver=./caffe-master/caffe-master/windows/CaptchaTest/dpsample/solver/lenet_solver.prototxt'#训练语句 os.system(cmd) os.chdir(path)
#调用模型 deploy='.dpsamplesolverlenet_deploy.prototxt' #deploy文件 caffe_model='.dpsampleiterate_iter_5000.caffemodel' #训练好的 caffemodel imgtest='./dpsample/data/val/685_363.png' #随机找的一张待测图片 net = caffe.Net(deploy, caffe_model, caffe.TEST) transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape}) #设定图片的shape格式(1,3,32,32) transformer.set_transpose('data', (2,0,1)) #改变维度的顺序,由原始图片(28,28,3)变为(3,28,28) #transformer.set_mean('data', np.load(mean_file).mean(1).mean(1)) #减去均值,前面训练模型时没有减均值,这儿就不用 #transformer.set_raw_scale('data', 1) # 缩放到【0,1】之间 已经在网络里设置scale,这里可以不用 transformer.set_channel_swap('data', (2,1,0)) #交换通道,将图片由RGB变为BGR im=caffe.io.load_image(imgtest) #加载图片 net.blobs['data'].data[...] = transformer.preprocess('data',im) #执行上面设置的图片预处理操作,并将图片载入到blob中 out = net.forward() prob= net.blobs['prob'].data[0].flatten() #取出最后一层(Softmax)属于某个类别的概率值,并打印 print prob order=prob.argsort()[-1] print(order)
#写在最后# 我是一个忠实的VS用户,所有代码都在VS编辑器实现的,它要能用python需要安装一个PTVS插件,在这里编辑python代码需要非常注意中文编码的处理,否则你会吃大苦头,不过相信我,其他编辑器能搞定的VS也一定没问题,只是你要有足够的耐心,遇到问题的时候多思考多搜搜问题的本质所在。
