zoukankan      html  css  js  c++  java
  • 记录初次使用tesseract的过程

    简介

    这个谷歌的识别项目早就听说了,使用之后发现,真的很厉害。写下初次简单使用的过程吧。

    安装tesseract

    谷歌的开源识别项目
    我下了这两个,chi是扩展的识别中文需要,只安装.exe即可,然后配置环境变量

    • chi_sim.traineddata
    • tesseract-ocr-w64-setup-v4.1.0.20190314.exe

    安装成功

    C:Users27569>tesseract
    Usage:
      tesseract --help | --help-extra | --version
      tesseract --list-langs
      tesseract imagename outputbase [options...] [configfile...]
    
    OCR options:
      -l LANG[+LANG]        Specify language(s) used for OCR.
    NOTE: These options must occur before any configfile.
    
    Single options:
      --help                Show this help message.
      --help-extra          Show extra help for advanced users.
      --version             Show version information.
      --list-langs          List available languages for tesseract engine.
    
    

    python应用识别图片

    使用python调用测试,windows下,我记得我程序第一次是不通的,后来改了tesseract文件的源码的某个路径才成功运行的

    requirment.txt

    pillow
    pytesseract
    

    run.py

    import io
    import re
    import pytesseract
    from PIL import Image
    
    
    class Ocr:
    
        def __init__(self):
            self.day_re = re.compile('(d{4}-d{2}-d{2})')
            self.daytime_re1 = re.compile('(d{2}:d{2})')
            self.daytime_re2 = re.compile('(d{2}:d{2}-d{2}:d{2})')
    
        def prepare_img(self, img):
            """图片预处理,提高识别率"""
            img = img.convert('L')
            threshold = 200  # 根据情况来定,127
            table = []
            for i in range(256):
                if i < threshold:
                    table.append(0)
                else:
                    table.append(1)
            return img.point(table, '1')
    
        def ocr(self, img):
            """识别"""
            img = self.prepare_img(img)
            return pytesseract.image_to_string(img, lang='eng', config='psm 7')  # lang: eng 英文, chi_sim 中文(需要训练库)
    
    
    if __name__ == '__main__':
        c = Ocr()
    
        with open('0.jpg', 'rb') as f:
            image_binary = f.read()
        byte_arr = io.BytesIO(image_binary)
    
        # Image.open() 打开图片的第一种方式
        img = Image.open(byte_arr)
        print(c.ocr(img))
    
        # Image.open() 打开图片的第二种方式
        img = Image.open('0.jpg')
        print(c.ocr(img))
    
    
  • 相关阅读:
    php环境配置中各个模块在网站建设中的功能
    PHP+Apache+MySQL+phpMyAdmin在win7系统下的环境配置
    August 17th 2017 Week 33rd Thursday
    August 16th 2017 Week 33rd Wednesday
    August 15th 2017 Week 33rd Tuesday
    August 14th 2017 Week 33rd Monday
    August 13th 2017 Week 33rd Sunday
    August 12th 2017 Week 32nd Saturday
    August 11th 2017 Week 32nd Friday
    August 10th 2017 Week 32nd Thursday
  • 原文地址:https://www.cnblogs.com/haoabcd2010/p/10769686.html
Copyright © 2011-2022 走看看