zoukankan html css js c++ java

记录初次使用tesseract的过程

简介

简介

这个谷歌的识别项目早就听说了，使用之后发现，真的很厉害。写下初次简单使用的过程吧。

安装tesseract

谷歌的开源识别项目
我下了这两个，chi是扩展的识别中文需要，只安装.exe即可，然后配置环境变量

chi_sim.traineddata
tesseract-ocr-w64-setup-v4.1.0.20190314.exe

安装成功

C:Users27569>tesseract
Usage:
  tesseract --help | --help-extra | --version
  tesseract --list-langs
  tesseract imagename outputbase [options...] [configfile...]

OCR options:
  -l LANG[+LANG]        Specify language(s) used for OCR.
NOTE: These options must occur before any configfile.

Single options:
  --help                Show this help message.
  --help-extra          Show extra help for advanced users.
  --version             Show version information.
  --list-langs          List available languages for tesseract engine.

python应用识别图片

使用python调用测试，windows下，我记得我程序第一次是不通的，后来改了tesseract文件的源码的某个路径才成功运行的

`requirment.txt`

pillow
pytesseract

`run.py`

import io
import re
import pytesseract
from PIL import Image


class Ocr:

    def __init__(self):
        self.day_re = re.compile('(d{4}-d{2}-d{2})')
        self.daytime_re1 = re.compile('(d{2}:d{2})')
        self.daytime_re2 = re.compile('(d{2}:d{2}-d{2}:d{2})')

    def prepare_img(self, img):
        """图片预处理，提高识别率"""
        img = img.convert('L')
        threshold = 200  # 根据情况来定，127
        table = []
        for i in range(256):
            if i < threshold:
                table.append(0)
            else:
                table.append(1)
        return img.point(table, '1')

    def ocr(self, img):
        """识别"""
        img = self.prepare_img(img)
        return pytesseract.image_to_string(img, lang='eng', config='psm 7')  # lang: eng 英文, chi_sim 中文(需要训练库)


if __name__ == '__main__':
    c = Ocr()

    with open('0.jpg', 'rb') as f:
        image_binary = f.read()
    byte_arr = io.BytesIO(image_binary)

    # Image.open() 打开图片的第一种方式
    img = Image.open(byte_arr)
    print(c.ocr(img))

    # Image.open() 打开图片的第二种方式
    img = Image.open('0.jpg')
    print(c.ocr(img))

查看全文

相关阅读:
Oracle+Ado.Net(四)
Oracle+Ado.Net(三)
json-server 详解
 在线字体图标
 HTML页面模板代码
 CSS样式重置
 WEB前端开发流程总结
 大前端-全栈-node+easyui+express+vue+es6+webpack+react
大前端全栈CSS3移动端开发
 jQuery学习

原文地址：https://www.cnblogs.com/haoabcd2010/p/10769686.html