OCR 综述

zoukankan html css js c++ java

OCR 综述
OCR 发展趋势
- 场景文本检测
- 场景文字识别
- 端到端场景文本识别
场景文字检测

方法举例:
- 基于回归的方法
  
  Gupta et al, CVPR 2016; Tian et al, ECCV 2016;
  
  Shi, Bai, et al, ICCV 2017; Liu et al, CVPR 2017;
  
  Liao et al, AAAI 2017; Hu et al, ICCV 2017 ...
- 基于分割的方法
  
  Zhong et al, CVPR 2016; Zhou et al, CVPR 2017;
  
  Wu et al, ICCV 2017; Dent et al, AAAI 2018;
  
  X Li, CVPR 2019; W Wang, et al, CVPR 2019 ...
- 混合方法 (分割+回归)
  
  He et al, ICCV 2017; Lyu et al, CVPR 2018;
  
  Liao et al, CVPR 2018; Long et al, ECCV 2018;
  
  Liu et al, IJCAI 2019 ...
发展趋势:

水平矩形框检测 (longrightarrow) 多方向矩形框 (longrightarrow) 多方向四边形 (longrightarrow) 曲线文本 (longrightarrow) 任意形状

注:
- Segmentation based 的方法不容易准确区分相邻或重叠文本
- Regression based 的方法对长文本不易检测完整
  
  Bounding box regression 方法需要设置合理的 anchor 参数
Anchor & RPN 调参问题:

Anchor free 回归方法举例:
- Segmentation based methods
- C.He et al, Direct Regression..., ICCV 2017, TIP 2018.
- Z Zhong et al, An Anchor-Free Region Proposal Network..., IJDAR 2019.
- Zhi Tian, Chunhua Shen, et al, FCOS, CVPR, 2019.
- Chenchen Zhu, Yihui He, et al, FSAF, CVPR, 2019.
- Tao Kong, Fuchun Sun et al, FoveaBox, arXiv 2019.
Why anchor free?
大多数 RPN regression 方法需要设置合理的 anchors 参数
Eg: SSD (longrightarrow) TextBox (AAAI 2017)

Alternative anchor design?
Lele Xie, Yuliang Liu, Lianwen Jin, Zecheng Xie, DeRPN: Taking a further step toward more general object detection, AAAI 2019.

场景文字识别

场景文字识别方法:
- 基于 CTC 的方法
  
  P.He et al, AAAI 2016 (DTRN: CNN+RNN+CTC)
  
  B.shi et al, TPAMI 2017 (CRNN: CNN+RNN+CTC)
  
  F Yin, et al, arXiv 2017 (CNN+CTC)
  
  Y Wu, etal, arXiv 2018 (CNN+CTC)
  
  Y Liu et al, ECCV 2018 (GAN+CTC)
- 基于 attention 的方法
  
  C Lee et al, CVPR 2016; B shi 二图案例, CVPR 2016
  
  X Yang et al, IJCAI 2017
  
  Bai et al, CVPR 2018; Liu et al, AAAI 2018
  
  Shi et al, TPAMI 2018 (ASTER)
  
  Luo et al, PR 2019 (MORAN)
发展趋势:

规则文本 (longrightarrow) 不规则文本识别
CTC (longrightarrow) Attention (1D, 2D)
检测 + 识别 (longrightarrow) 检测识别端到端

Attention or CTC ?

长文本 CTC 好, 短文本 attention 好

Limitation of Attention and CTC

CTC:
- Can hardly be directly applied to 2D prediction
- Large computation involved for long sequence
- Performance degradation for repeat patterns
Attention:
- Misalignment problem (attention drift)
- More memory size required
Why End2End ?
- Prevent training errors be accumulater
  
  errors can accumulate in a cascade of detection + recognition which may lead to large fraction of garbage predictions
- Jointly optimization to help improve overall performance
- Easier to maintain and adapt to new domain
  
  maintaining a cascaded pipeline with data and model dependencied requires substantial engineering effort
- Faster, Smaller, Stronger
Some new technique to bridge between detector and recognizer
- RoI Rotate (多方向 e2e)
  
  X Liu, et al, FOTS, CVPR 2018
- Tailored RoI pooling (保持长宽比重采样)
  
  H Li et al. Towards End-to-EndText Spotting in Natural Scenes, arXiv 20190617 (extionsion of "H Li et al ICCV 2017")
- RoI Masking (任意形状e2e)
  
  S Qin, A Bissacco, et al(Google AI), Towards Unconstrained End-to-End Text Spotting, ICCV 2019
查看全文

相关阅读:
IIS处理Asp.net请求和 Asp.net页面生命周期
 帝国CMS实现一二级导航及其高亮
 dsoframer.ocx 遇到64为系统
 实现可编辑的表格
 delegate()
当前页面高亮的方法！
简单的js版tab
js判断最大值
 解决chrome下的默认样式！
纯js点击隐藏相邻节点

原文地址：https://www.cnblogs.com/larkiisready/p/11696276.html

OCR 发展趋势

场景文字检测

方法举例:

发展趋势:

Anchor & RPN 调参问题:

场景文字识别

场景文字识别方法:

发展趋势:

Attention or CTC ?

Limitation of Attention and CTC

Why End2End ?

Some new technique to bridge between detector and recognizer