zoukankan      html  css  js  c++  java
  • ubuntu linux 1604 编译安装tesseract-ocr 4.0

     

    主要参考官方的编译,梳理一下整个流程

    Linux

    The build instructions for Linux also apply to other UNIX like operating systems.

    Dependencies

    • A compiler for C and C++: GCC or Clang
    • GNU Autotools: autoconf, automake, libtool
    • autoconf-archive
    • pkg-config
    • Leptonica
    • libpng, libjpeg, libtiff

    Ubuntu

    If they are not already installed, you need the following libraries (Ubuntu 16.04/14.04):

      一、安装依赖:

    sudo apt-get install g++ autoconf automake libtool autoconf-archive pkg-config libpng12-dev libjpeg8-dev libtiff5-dev zlib1g-dev  libleptonica-dev -y

    或者一条一条复制:
    sudo apt-get install g++ # or clang++ (presumably)
    sudo apt-get install autoconf automake libtool
    sudo apt-get install autoconf-archive
    sudo apt-get install pkg-config
    sudo apt-get install libpng12-dev
    sudo apt-get install libjpeg8-dev
    sudo apt-get install libtiff5-dev
    sudo apt-get install zlib1g-dev
    

    if you plan to install the training tools, you also need the following libraries:

    安装训练所依赖的库:
    sudo apt-get install libicu-dev libpango1.0-dev  libcairo2-dev

    或者:
    sudo apt-get install libicu-dev sudo apt-get install libpango1.0-dev sudo apt-get install libcairo2-dev

    Leptonica

    You also need to install Leptonica. Ensure that the development headers for Leptonica are installed before compiling Tesseract.

    Tesseract versions and the minimum version of Leptonica required:

    二、安装leptonica,

    因为tesseract依赖这个库,否则在configure的时候会提示

    最新的tesseract 4.0 及3.05 需要从Leptonica 源代码编译

    git clone https://github.com/DanBloomberg/leptonica.git

    cd leptonica

    ./configure

    make -j8 && make install

    TesseractLeptonicaUbuntu
    4.00 1.74.2 Must build from source
    3.05 1.74.0 Must build from source
    3.04 1.71 Ubuntu 16.04
    3.03 1.70 Ubuntu 14.04
    3.02 1.69 Ubuntu 12.04
    3.01 1.67  

    One option is to install the distro's Leptonica package:

    sudo apt-get install libleptonica-dev
    

    but if you are using an oldish version of Linux, the Leptonica version may be too old, so you will need to build from source.

    The sources are at https://github.com/DanBloomberg/leptonica . The instructions for building are given in Leptonica README.

    Note that if building Leptonica from source, you may need to ensure that /usr/local/lib is in your library path. This is a standard Linux bug, and the information at Stackoverflow is very helpful.

    Installing Tesseract from Git

    Please follow instructions in https://github.com/tesseract-ocr/tesseract/wiki/Compiling--GitInstallation

    Also read Install Instructions

    三、编译tesseract


    clone源代码 :
    git clone https://github.com/tesseract-ocr/tesseract.git  tesseract-ocr
    cd tesseract-ocr
        ./autogen.sh
       autoreconf -i ./configure
    这时会提示:
    Configuration is done.
    You can now build and install tesseract by running:

    $ make
    $ sudo make install

    Training tools can be built and installed with:

    $ make training
    $ sudo make training-install

    继续编译,先编译tesseract,在编译安装 training
       make  
        sudo make install
      
      make training
      make training-install

    sudo ldconfig

    到这就完成了真个编译过程,这个时候 在命令行中 输入tesseract 会提示怎么用。



    四、配置字体库
    tesseract/tessdata是一个配置目录可以以此为基础把所有用的语言包放在这里面
    cd tesseract的父目录
    cp -r  tesseract/tessdata/ tessdata/
    下载需要的语言包 https://github.com/tesseract-ocr/tessdata_best 里面有各种语言包,这是训练好的语言包。简体中文下载:chi_sim.traineddata chi_sim_vert.traineddata

    下载好的语言包 放在tessdata目录里面

    设置环境变量 tessdata的父目录。如:export TESSDATA_PREFIX=/media/sf_E_DRIVE/src-test/tesseract_all/tesseract_linux

     

    五、使用tesseract
    具体用法可参考tesseract的使用说明

    tesseract /home/app/1.png output -l chi_sim
    识别/home/app/1.png这张图片。输出到output.txt 里面,用chi_sim 识别(不用加.traineddata,会默认加
    cat output.txt 可以查看刚才的内容


    Install elsewhere / without root

    Tesseract can be configured to install anywhere, which makes it possible to install it without root access.

    To install it in $HOME/local:

    ./autogen.sh
    ./configure --prefix=$HOME/local/
    make install
    

    To install it in $HOME/local using Leptonica libraries also installed in $HOME/local:

    ./autogen.sh
    LIBLEPT_HEADERSDIR=$HOME/local/include ./configure 
      --prefix=$HOME/local/ --with-extra-libraries=$HOME/local/lib
    make install
    

    Video representation of the Compiling process for Tesseract 4.0 and Leptonica 1.7.4 on Ubuntu 16.xx

    Language Data

    You can also use:

    export TESSDATA_PREFIX=/some/path/to/tessdata
    

    to point to your tessdata directory (example: if your tessdata path is '/usr/local/share/tessdata' you have to use 'export TESSDATA_PREFIX='/usr/local/share/').


  • 相关阅读:
    粗浅看Struts2和Hibernate框架
    使用nexus搭建Maven私服
    在线支付功能的设计及其实现
    用户注册的邮箱激活模块的设计与实现
    WebService案例入门(基础篇)
    过滤器应用案例分析
    Java web文件上传下载
    Servlet常用操作(基础)
    AndroidStudio中导入SlidingMenu报错解决方案
    新浪微博Oauth2.0授权认证及SDK、API的使用(Android)
  • 原文地址:https://www.cnblogs.com/zhishuai/p/7851977.html
Copyright © 2011-2022 走看看