[转]Mat to PIX during integrate opencv with tesseract

zoukankan html css js c++ java

[转]Mat to PIX during integrate opencv with tesseract

Mat to PIX during integrate opencv with tesseract

2013-01-29

转载地址：http://hepeng421.blog.163.com/blog/static/11948517201302911311745/

when I integrate opencv with tesseract ,this is the following:

I'm using OpenCV to extract a subimage of a scanned document and would like to use tesseract to perform OCR over this subimage.

I found out that I can use two methods for text recognition in tesseract, but so far I wasn't able to find a working solution.

A.) How can I convert a cv::Mat into a PIX*? (PIX* is a datatype of leptonica)

Based on vasiles code below, this is essentially my current code:

cv::Mat image = cv::imread("c:/image.png");

cv::Mat subImage = image(cv::Rect(50, 200, 300, 100));

int depth;

if(subImage.depth() == CV_8U)

depth = 8;

//other cases not considered yet

PIX* pix = pixCreateHeader(subImage.size().width, subImage.size().height, depth);

pix->data = (l_uint32*) subImage.data;

tesseract::TessBaseAPI tess;

STRING text;

if(tess.ProcessPage(pix, 0, 0, &text))

{

std::cout << text.string();

}

While it doesn't crash or anything, the OCR result still is wrong. It should recognize one word of my sample image, but instead it returns some non-readable characters.

The method PIX_HEADER doesn't exist, so I used pixCreateHeader, but it doesn't take the number of channels as an argument. So how can I set the number of channels?

B.) How can I use cv::Mat for TesseractRect() ?

Tesseract offers another method for text recognition with this signature:

char * TessBaseAPI::TesseractRect (

const UINT8 * imagedata,

int bytes_per_pixel,

int bytes_per_line,

int left,

int top,

int width,

int height

)

Currently I am using the following code, but it also returns non-readable characters (although different ones than from the code above.

char* cr = tess.TesseractRect(

subImage.data,

subImage.channels(),

subImage.channels() * subImage.size().width,

0,

0,

subImage.size().width,

subImage.size().height);

c.)sample code here:

tesseract::TessBaseAPI tess;

cv::Mat sub = image(cv::Rect(50, 200, 300, 100));

tess.SetImage((uchar*)sub.data, sub.size().width, sub.size().height, sub.channels(), sub.step1());

tess.Recognize(0);

const char* out = tess.GetUTF8Text();

d.)the solution in details:

First, make a deep copy of your subImage, so that it will be stored in a coninuous memory block:

cv::Mat subImage = image(cv::Rect(50, 200, 300, 100)).clone();

Then, init a PIX headed (I don't know how) with the correct parameters.

// ???? Put your own constructor here.

PIX* pix = new PIX_HEADER(width, height, channels, depth);

OR, create it manually:

PIX pix;

pix.width = subImage.width;

...

Then set the pix data pointer to the subImage data pointer

pix.data = subImage.data;

Finally, make sure your subImage objects does not go out of scope before you finish your work with pix.

e.)the solution found over internet:

the solution provided is not yet achieved as linking the two(tesseract ocr-opencv)has become a challenge.

And so,have attached a image which gives u a clear idea about what kind of input is

The numbers are only marked for your ease understanding which are nothing but the location of the text available.

Suppose its a input image where i have to read the contents within the box.

If there is nothing just the rectangle with text, you can pass image to tesseract.

If there is something around rectangle with text (e.g. you want to ignore everything outside rectangle) you need to:

identify rectangle coordinates (with some opencv function or maybe with GetConnectedComponents from tesseract api)

use SetRectangle after SetImage from tesseract api

then how to code that.am not using any of the python tesseract library as in my case programming language used is c++.

It would be of great help if somebody comes up within the snippet to achieve this.

Do you mean something like this?

#include <tesseract/baseapi.h>

#include <leptonica/allheaders.h>

#include <opencv2/opencv.hpp>

int main() {

tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();

if (api->Init("/usr/src/tesseract-ocr/", "eng")) {

fprintf(stderr, "Could not initialize tesseract. ");

return 1;

}

IplImage *img = cvLoadImage("/home/user/sampleimage.png");

if ( img == 0 ) {

fprintf(stderr, "Cannot load input file! ");

return 1;

}

api->SetImage((unsigned char*)img->imageData, img->width,

img->height, img->nChannels, img->widthStep);

// be aware of tesseract coord systems starting at left top corner!

api->SetRectangle(129, 184, 484, 108);

char* outText = api->GetUTF8Text();

printf("OCR output: ");

printf(outText);

api->Clear();

api->End();

delete [] outText;

delete api;

cvReleaseImage(&img);

return 0;

}

查看全文

相关阅读:
less常用样式集，清除浮动、背景自适应、背景渐变、圆角、内外阴影、高度宽度计算。
three.js是什么，能干嘛，和webgl什么关系
 网页兼容问题
 angular可自定义的对话框,弹窗指令
 three.js 相机camera位置属性设置详解
 移动端，PC端，微信等常用平台和浏览器判断
 css3,背景渐变,条纹,其它样式
 微信授权登录实现
 汉字转拼音
 springmvc json数据交互

原文地址：https://www.cnblogs.com/Crysaty/p/6529669.html