zoukankan      html  css  js  c++  java
  • 制作自己的tesseract-docker环境镜像(实战)

      做OCR图文识别,在linux系统上发布时,需要安装tesseract环境。网上信息比较杂,基于各种linux系统做的Dockerfile,其表现也是五花八门,搞不清白。以下是我经过一两天的摸索的成果,可以有效的部署环境,希望对大家有用。过程大致分为三个阶段:1、制作基础镜像包,安装tesseract环境;2、上传tessdata语言包到服务器上,供tesseract识别时对照;3、制作应用程序的镜像,挂载tessdata语言包目录到/usr/local/share/tessdata,同时设置docker容器的环境变量TESSDATA_PREFIX;

    一、准备基础镜像的Dockerfile文件。需要相关资源文件 tesseract-4.1.1.tar.gz,leptonica-1.80.0.tar.gz

    https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.1

    http://www.leptonica.org/source/leptonica-1.80.0.tar.gz

    FROM mamohr/centos-java
    LABEL ANTHOR="siman(214382122@qq.com)" VERSION="1.0.0" BUILD_DATE="2020-09-01" 
          RESOURCES="https://github.com/tesseract-ocr/tesserac http://www.leptonica.org/index.html https://github.com/tesseract-ocr/tessdata" 
          DESCRIPTION="This image integrated and edited the running environment of tesseract-4.1.1 and leptonica-1.80.0, 
          and made it based on CentOS system. Based on this basic image, you can run your own tess4j jar application"
    
    # 环境变量(tesseract)
    ENV LD_LIBRARY_PATH="/usr/local/lib" 
        LIBLEPT_HEADERSDIR="/usr/local/include" 
        PKG_CONFIG_PATH="/usr/local/lib/pkgconfig"
    # 安装tesseract环境
    ADD   tesseract-4.1.1.tar.gz /
    ADD   leptonica-1.80.0.tar.gz /
    
    RUN   yum -y install file automake libicu-devel libpango1.0-dev libcairo-dev libjpeg-devel libpng-devel libtiff-devel zlib-devel libtool gcc-c++ make 
          && cd /leptonica-1.80.0 && ./configure && make && make install 
          && cd /tesseract-4.1.1 && ./autogen.sh && ./configure && make && make install 
          && rm -rf /leptonica-1.80.0 /tesseract-4.1.1
    # 时区设置
    RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
    RUN echo 'Asia/Shanghai' >/etc/timezone

    二、创建基础镜像包

    docker build -t tess/centos-java:v1.0 . 

    三、安装tessdata包

     链接: https://pan.baidu.com/s/1XAvPkTdUXuFq-q2InDREhQ 提取码: 6vjp  

    四、制作自己的springboot-ocr服务镜像包,设置环境变量TESSDATA_PREFIX

    FROM tess/centos-java:v1.0
    LABEL ANTHOR="siman(214382122@qq.com)" VERSION="1.0.0" BUILD_DATE="2020-09-01"
    VOLUME /tmp
    ADD simm-framework-test-1.0.jar app.jar
    EXPOSE 8080
    ENV  TESSDATA_PREFIX="/usr/local/share/tessdata"
    # 启动入口
    ENTRYPOINT ["java","-jar","/app.jar"]

     五、启动容器,并挂载tessdata目录

    docker run -it -v /usr/tessdata:/usr/local/share/tessdata -p 8080:8080 --name="ocr-api" ocr-api:v1.0
  • 相关阅读:
    Proj THUDBFuzz Paper Reading: The Art, Science, and Engineering of Fuzzing: A Survey
    Proj THUDBFuzz Paper Reading: A systematic review of fuzzing based on machine learning techniques
    9.3 付费代理的使用
    11.1 Charles 的使用
    第十一章 APP 的爬取
    10.2 Cookies 池的搭建
    10.1 模拟登录并爬取 GitHub
    11.5 Appium 爬取微信朋友圈
    11.4 Appium 的基本使用
    11.3 mitmdump 爬取 “得到” App 电子书信息
  • 原文地址:https://www.cnblogs.com/MrSi/p/13601294.html
Copyright © 2011-2022 走看看