zoukankan      html  css  js  c++  java
  • OCR光学字符识别--STN-OCR 测试

    1、同文章中建议的使用ubuntu-python隔离环境,真的很好用

    参照:http://blog.topspeedsnail.com/archives/5618
    启动虚拟环境:
    source env/bin/activate
    退出虚拟环境:
    deactivate
    注意:下面的操作全部都要在隔离环境中完成
    2、搭建虚拟环境
    pip install -r(requests)应该是安装request中所有的包
    pip install Cython == 0.26
    sudo apt-get install python3-dev
    editdistance == 0.3.13、

    3、

    参照,编译百度warpctc
    http://blog.csdn.net/amds123/article/details/73433926
    git clone
    https://github.com/baidu-research/warp-ctc.git

    cd warp-ctc
    mkdir build
    cd build
    cmake ..
    make
    sudo make install

    执行文章中snt-orc
    mxnet/metrics/ctc` and run `python setup.py build_ext --inplace`

    4、
    编译MXNET:
    git clonr --recursive mxnet
    cd mxnet
    git tag
    git checkout v0.9.3
    按照论文中的方法编译失败,只能下载新版本编译
    新版本编译步骤参考:https://www.bbsmax.com/A/A7zgqGk54n/
    安装依赖:
    $ sudo apt-get install -y build-essential git

    $ sudo apt-get install -y libopenblas-dev

    $ sudo apt-get install -y libopencv-dev

    git clone --recursive https://github.com/dmlc/mxnet.git
    cd mxnet
    cp make/*.ck ./(编译选项文件)
    vim *(按需修改编译文件)文章要求加入warpctc
    https://mxnet.incubator.apache.org/tutorials/speech_recognition/baidu_warp_ctc.html
    make -j4

    5、
    编译python接口参照
    http://blog.csdn.net/zziahgf/article/details/72729883
    编译 MXNet的Python API:
    安装所需包
    sudo apt-get install -y python-dev python-setuptools python-numpy
    cd python
    sudo python setup.py install

    6、
    下载stn-orc网络:https://github.com/Bartzi/stn-ocr
    这个网络感觉跟FCN使用差不多,应该不需要什么格外操作

    7、
    下载model
    https://bartzi.de/research/stn-ocr
    中的文本识别:会有model文件夹,测试数据集
    model文件夹中有两个文件
    *.params是模型文件,*.json应该是网络描述文件
    测试数据集中有图片文件夹,gt文件,还有一个不知道是什么用
    还需要一个文件stn-orc网络中data文件对应‘文本’中应有个char_map文件,后面需要
    模型预测代码就是stn-orc文件下的eva的py代码,看名字就知道,不过由于之前下载的是新版本,跟文中不同,所以使用这里的py文件没有运行成功,仿照文件自己写了一个简单的测试文件:

    import matplotlib.pyplot as plt
    
    import argparse
    import csv
    import json
    import os
    from collections import namedtuple
    
    from PIL import Image
    
    import editdistance
    import mxnet as mx
    import numpy as np
    
    from callbacks.save_bboxes import BBOXPlotter
    from metrics.ctc_metrics import strip_prediction
    from networks.text_rec import SVHNMultiLineCTCNetwork
    from operations.disable_shearing import *
    from utils.datatypes import Size
    
    Batch = namedtuple('Batch', ['data'])
    
    #后缀都不能加的,程序自己添加,似乎同时加载两个文件
    sym,arg_params,aux_params = mx.model.load_checkpoint('./testxt/model/model',2)
    #这里面应该是训练的参数
    #print(arg_params)
    net, loc, transformed_output, size_params = SVHNMultiLineCTCNetwork.get_network((1,1,64,200),Size(50,50),46,2,23)
    output = mx.sym.Group([loc, transformed_output, net])
    
    #靠 在这里预定义的话,TMD,soft 层怎么办?
    mod = mx.mod.Module(output,context=mx.cpu(),data_names=['data',
            'softmax_label',
    	'l0_forward_init_h_state',
    	'l0_forward_init_c_state_cell',
    	'l1_forward_init_h_state',
    	'l1_forward_init_c_state_cell'
    	
    ],label_names=[])
    mod.bind(for_training=False,grad_req='null',data_shapes=[
    	    ('data',(1,1,64,200)),
                ('softmax_label', (1,23)),
    	    ('l0_forward_init_h_state', (1, 1, 256)),
                ('l0_forward_init_c_state_cell', (1, 1, 256)),
                ('l1_forward_init_h_state', (1, 1, 256)),
                ('l1_forward_init_c_state_cell', (1, 1, 256))
    	])
    arg_params['l0_forward_init_h_state'] = mx.nd.zeros((1, 1, 256))
    arg_params['l0_forward_init_c_state_cell'] = mx.nd.zeros((1, 1, 256))
    arg_params['l1_forward_init_h_state'] = mx.nd.zeros((1, 1, 256))
    arg_params['l1_forward_init_c_state_cell'] = mx.nd.zeros((1, 1, 256))
    mod.set_params(arg_params, aux_params)
    
    #看看怎么加载label
    #一个映射文件,类似caffe中的label,在下面循环中用到
    with open('/home/lbk/python-env/stn-ocr/mxnet/testxt/ctc_char_map.json') as char_map_file:
        char_map = json.load(char_map_file)
    reverse_char_map = {v: k for k, v in char_map.items()}
    print(len(reverse_char_map))
    
    with open('/home/lbk/python-env/stn-ocr/mxnet/testxt/icdar2013_eval/one_gt.txt') as eval_gt:
        reader = csv.reader(eval_gt,delimiter=';')
        for idx,line in enumerate(reader):
            file_name = line[0]
            label = line[1].strip()
            gt_word = label.lower()
            print(gt_word)
            #这一步又是干什么的
            #dict.get(key,default)查找,不存在返回default
            label = [reverse_char_map.get(ord(char.lower()),reverse_char_map[9250]) for char in gt_word]
            label+=[reverse_char_map[9250]]*(23-len(label))
            #print(label)
            the_image = Image.open(file_name)
            the_image = the_image.convert('L')
            the_image = the_image.resize((200,64), Image.ANTIALIAS)
            image = np.asarray(the_image, dtype=np.float32)[np.newaxis, np.newaxis, ...]
            image/=255
            temp = mx.nd.zeros((1,1,256))
            label = mx.nd.array([label])
            image = mx.nd.array(image)
            print(type(temp),type(label))
            input_batch = Batch(data=[image,label,temp,temp,temp,temp])
    
            mod.forward(input_batch,is_train=False)
            print(len(mod.get_outputs()))
            print('0000',mod.get_outputs()[2])
            predictions = mod.get_outputs()[2].asnumpy()
            predicted_classes = np.argmax(predictions,axis=1)
            print(len(predicted_classes))
            print(predicted_classes)
    
            predicted_classes = strip_prediction(predicted_classes, int(reverse_char_map[9250]))
            predicted_word = ''.join([chr(char_map[str(p)]) for p in predicted_classes]).replace(' ', '')
            print(predicted_word)
    
            distance = editdistance.eval(gt_word, predicted_word)
            print("{} - {}		{}: {}".format(idx, gt_word, predicted_word, distance))
    
            results = [prediction == label for prediction, label in zip(predicted_word, gt_word)]
            print(results)
    

      

    补充:
    学习MXNET:
    http://www.infoq.com/cn/articles/an-introduction-to-the-mxnet-api-part04
    http://blog.csdn.net/yiweibian/article/details/72678020
    http://ysfalo.github.io/2016/04/01/mxnet%E4%B9%8Bfine-tune/
    http://shuokay.com/2016/01/01/mxnet-memo/

  • 相关阅读:
    Git出现error: Your local changes to the following files would be overwritten by merge: ... Please, commit your changes or stash them before you can merge.的问题解决(Git代码冲突)
    JDK内置工具jstack(Java Stack Trace)(转)
    Java 5/Java 6/Java7/Java 8新特性收集
    Linux使用screen实现关闭ssh连接的情况下,让程序继续在后台运行
    Linux出现cannot create temp file for here-document: No space left on device的问题解决
    解决树莓派8G的SD卡只能识别3.3G,SD卡扩容
    Windows下拷贝Linux的文件到本地(Putty)
    Linux下运行Java项目时,出现No X11 DISPLAY variable was set, but this program performed an operation which requires it.的问题解决
    Maven错误 diamond operator is not supported in -source 1.5 (use -source 7 or higher to enable diamond operator)问题解决
    Ubuntu下安装Maven
  • 原文地址:https://www.cnblogs.com/kanuore/p/7522500.html
Copyright © 2011-2022 走看看