zoukankan      html  css  js  c++  java
  • OCR 综述

    OCR 发展趋势

    • 场景文本检测
    • 场景文字识别
    • 端到端场景文本识别

    场景文字检测

    方法举例:

    • 基于回归的方法

      • Gupta et al, CVPR 2016; Tian et al, ECCV 2016;
      • Shi, Bai, et al, ICCV 2017; Liu et al, CVPR 2017;
      • Liao et al, AAAI 2017; Hu et al, ICCV 2017 ...
    • 基于分割的方法

      • Zhong et al, CVPR 2016; Zhou et al, CVPR 2017;
      • Wu et al, ICCV 2017; Dent et al, AAAI 2018;
      • X Li, CVPR 2019; W Wang, et al, CVPR 2019 ...
    • 混合方法 (分割+回归)

      • He et al, ICCV 2017; Lyu et al, CVPR 2018;
      • Liao et al, CVPR 2018; Long et al, ECCV 2018;
      • Liu et al, IJCAI 2019 ...

    发展趋势:

    水平矩形框检测 (longrightarrow) 多方向矩形框 (longrightarrow) 多方向四边形 (longrightarrow) 曲线文本 (longrightarrow) 任意形状

    注:

    • Segmentation based 的方法不容易准确区分相邻或重叠文本
    • Regression based 的方法对长文本不易检测完整
      • Bounding box regression 方法需要设置合理的 anchor 参数

    Anchor & RPN 调参问题:

    Anchor free 回归方法举例:

    • Segmentation based methods
    • C.He et al, Direct Regression..., ICCV 2017, TIP 2018.
    • Z Zhong et al, An Anchor-Free Region Proposal Network..., IJDAR 2019.
    • Zhi Tian, Chunhua Shen, et al, FCOS, CVPR, 2019.
    • Chenchen Zhu, Yihui He, et al, FSAF, CVPR, 2019.
    • Tao Kong, Fuchun Sun et al, FoveaBox, arXiv 2019.

    Why anchor free?
    大多数 RPN regression 方法需要设置合理的 anchors 参数
    Eg: SSD (longrightarrow) TextBox (AAAI 2017)

    Alternative anchor design?
    Lele Xie, Yuliang Liu, Lianwen Jin, Zecheng Xie, DeRPN: Taking a further step toward more general object detection, AAAI 2019.

    场景文字识别

    场景文字识别方法:

    • 基于 CTC 的方法

      • P.He et al, AAAI 2016 (DTRN: CNN+RNN+CTC)
      • B.shi et al, TPAMI 2017 (CRNN: CNN+RNN+CTC)
      • F Yin, et al, arXiv 2017 (CNN+CTC)
      • Y Wu, etal, arXiv 2018 (CNN+CTC)
      • Y Liu et al, ECCV 2018 (GAN+CTC)
    • 基于 attention 的方法

      • C Lee et al, CVPR 2016; B shi 二图案例, CVPR 2016
      • X Yang et al, IJCAI 2017
      • Bai et al, CVPR 2018; Liu et al, AAAI 2018
      • Shi et al, TPAMI 2018 (ASTER)
      • Luo et al, PR 2019 (MORAN)

    发展趋势:

    规则文本 (longrightarrow) 不规则文本识别
    CTC (longrightarrow) Attention (1D, 2D)
    检测 + 识别 (longrightarrow) 检测识别端到端

    Attention or CTC ?

    长文本 CTC 好, 短文本 attention 好

    Limitation of Attention and CTC

    CTC:

    • Can hardly be directly applied to 2D prediction
    • Large computation involved for long sequence
    • Performance degradation for repeat patterns

    Attention:

    • Misalignment problem (attention drift)
    • More memory size required

    Why End2End ?

    • Prevent training errors be accumulater
      • errors can accumulate in a cascade of detection + recognition which may lead to large fraction of garbage predictions
    • Jointly optimization to help improve overall performance
    • Easier to maintain and adapt to new domain
      • maintaining a cascaded pipeline with data and model dependencied requires substantial engineering effort
    • Faster, Smaller, Stronger

    Some new technique to bridge between detector and recognizer

    • RoI Rotate (多方向 e2e)
      • X Liu, et al, FOTS, CVPR 2018
    • Tailored RoI pooling (保持长宽比重采样)
      • H Li et al. Towards End-to-EndText Spotting in Natural Scenes, arXiv 20190617 (extionsion of "H Li et al ICCV 2017")
    • RoI Masking (任意形状e2e)
      • S Qin, A Bissacco, et al(Google AI), Towards Unconstrained End-to-End Text Spotting, ICCV 2019
  • 相关阅读:
    面向对象三大特性?
    final finally finalize区别?
    LeetCode122-买卖股票的最佳时机2(贪心算法)
    LeetCode119-杨辉三角2(题目有Bug,动态规划)
    九度OJ 1051:数字阶梯求和 (大数运算)
    九度OJ 1050:完数 (数字特性)
    九度OJ 1049:字符串去特定字符 (基础题)
    九度OJ 1048:判断三角形类型 (基础题)
    九度OJ 1047:素数判定 (素数)
    九度OJ 1046:求最大值 (基础题)
  • 原文地址:https://www.cnblogs.com/larkiisready/p/11696276.html
Copyright © 2011-2022 走看看