zoukankan html css js c++ java

centos7下安装tesseract-ocr进行验证码识别

摘要：

　　centos7安装依赖库

　　tesseract配置

　　代码例子

centos7安装依赖库

安装centos系统依赖

yum install -y automake autoconf libtool gcc gcc-c++ 
yum install -y libpng-devel libjpeg-devel libtiff-devel

安装leptonica

wget http://www.leptonica.org/source/leptonica-1.72.tar.gz
tar xvzf leptonica-1.72.tar.gz
cd leptonica-1.72/ 
./configure 
make && make install

安装tesseract-ocr

wget https://github.com/tesseract-ocr/tesseract/archive/3.04.zip
unzip 3.04.zip
cd tesseract-3.04/ 
./configure
make && make install 
sudo ldconfig

部署模型
- 在https://github.com/tesseract-ocr/tessdata 下载对应语言的模型文件
- 将模型文件移动到/usr/local/share/tessdata
安装requirements.txt中的python依赖库
```
pip install -r requirements.txt
```

tesseract配置

在/usr/local/share/tessdata创建eng.user-patterns写入
表示识别6位字符（或数字）

在/usr/local/share/tessdata/configs创建myconfig写入

#识别白名单
tessedit_char_whitelist abcdefghijklmnopqrstuvwxyz0123546789
#用户正则模式匹配
user_patterns_suffix user-patterns

psm参数说明

-psm N
  Set Tesseract to only run a subset of layout analysis and assume a certain form of image. The options for N are:

  0 = Orientation and script detection (OSD) only.
  1 = Automatic page segmentation with OSD.
  2 = Automatic page segmentation, but no OSD, or OCR.
  3 = Fully automatic page segmentation, but no OSD. (Default)
  4 = Assume a single column of text of variable sizes.
  5 = Assume a single uniform block of vertically aligned text.
  6 = Assume a single uniform block of text.
  7 = Treat the image as a single text line.
  8 = Treat the image as a single word.
  9 = Treat the image as a single word in a circle.
  10 = Treat the image as a single character.

代码例子

1 import pytesseract
2 from PIL import Image
3 
4 image = Image.open('code.png')
5 code = pytesseract.image_to_string(image)
6 print code

查看全文

相关阅读:
如何让一个浮动垂直居中：两种方式！带来效果~~~~~~
rgba()和opacity之间的区别(面试题)
常用浏览器内核！IE，Chrome ，Firefox，Safari，Opera 等内核
 有关Option.inSamplSize 和 Compress 图片压缩
 Android App 启动 Activity 创建解析
 （转）windows一台电脑添加多个git账号
 Handler向子线程发送数据
 Android Touch事件分发
 int 转十六进制
 JVM client模式和Server模式

原文地址：https://www.cnblogs.com/arachis/p/OCR.html