zoukankan      html  css  js  c++  java
  • centos7下安装tesseract-ocr进行验证码识别

    摘要:

      centos7安装依赖库

      tesseract配置

      代码例子

    centos7安装依赖库

    • 安装centos系统依赖

      yum install -y automake autoconf libtool gcc gcc-c++ 
      yum install -y libpng-devel libjpeg-devel libtiff-devel
    • 安装leptonica

      wget http://www.leptonica.org/source/leptonica-1.72.tar.gz
      tar xvzf leptonica-1.72.tar.gz
      cd leptonica-1.72/ 
      ./configure 
      make && make install
    • 安装tesseract-ocr

      wget https://github.com/tesseract-ocr/tesseract/archive/3.04.zip
      unzip 3.04.zip
      cd tesseract-3.04/ 
      ./configure
      make && make install 
      sudo ldconfig
    • 部署模型

    • 安装requirements.txt中的python依赖库

      pip install -r requirements.txt

    tesseract配置

    • 在/usr/local/share/tessdata创建eng.user-patterns写入

      
      
      
      
      
      
      

      表示识别6位字符(或数字)

    • 在/usr/local/share/tessdata/configs创建myconfig写入

      #识别白名单
      tessedit_char_whitelist abcdefghijklmnopqrstuvwxyz0123546789
      #用户正则模式匹配
      user_patterns_suffix user-patterns
    • psm参数说明

      -psm N
        Set Tesseract to only run a subset of layout analysis and assume a certain form of image. The options for N are:
      
        0 = Orientation and script detection (OSD) only.
        1 = Automatic page segmentation with OSD.
        2 = Automatic page segmentation, but no OSD, or OCR.
        3 = Fully automatic page segmentation, but no OSD. (Default)
        4 = Assume a single column of text of variable sizes.
        5 = Assume a single uniform block of vertically aligned text.
        6 = Assume a single uniform block of text.
        7 = Treat the image as a single text line.
        8 = Treat the image as a single word.
        9 = Treat the image as a single word in a circle.
        10 = Treat the image as a single character.

    代码例子

    1 import pytesseract
    2 from PIL import Image
    3 
    4 image = Image.open('code.png')
    5 code = pytesseract.image_to_string(image)
    6 print code
  • 相关阅读:
    Map集合
    Collections 工具类
    LinkedList 集合
    List集合
    Iterator迭代器
    Collection集合
    时间日期类
    一看就懂!速写docker 容器数据库备份脚本
    Nginx 配置之HTTPS和WSS那些你不知道的事!
    https 证书认证/颁发/秒级认证无烦恼
  • 原文地址:https://www.cnblogs.com/arachis/p/OCR.html
Copyright © 2011-2022 走看看