Mat to PIX during integrate opencv with tesseract
2013-01-29
转载地址:http://hepeng421.blog.163.com/blog/static/11948517201302911311745/
when I integrate opencv with tesseract ,this is the following:
I'm using OpenCV to extract a subimage of a scanned document and would like to use tesseract to perform OCR over this subimage.
I found out that I can use two methods for text recognition in tesseract, but so far I wasn't able to find a working solution.
A.) How can I convert a cv::Mat into a PIX*? (PIX* is a datatype of leptonica)
Based on vasiles code below, this is essentially my current code:
cv::Mat image = cv::imread("c:/image.png");
cv::Mat subImage = image(cv::Rect(50, 200, 300, 100));
int depth;
if(subImage.depth() == CV_8U)
depth = 8;
//other cases not considered yet
PIX* pix = pixCreateHeader(subImage.size().width, subImage.size().height, depth);
pix->data = (l_uint32*) subImage.data;
tesseract::TessBaseAPI tess;
STRING text;
if(tess.ProcessPage(pix, 0, 0, &text))
{
std::cout << text.string();
}
While it doesn't crash or anything, the OCR result still is wrong. It should recognize one word of my sample image, but instead it returns some non-readable characters.
The method PIX_HEADER doesn't exist, so I used pixCreateHeader, but it doesn't take the number of channels as an argument. So how can I set the number of channels?
B.) How can I use cv::Mat for TesseractRect() ?
Tesseract offers another method for text recognition with this signature:
char * TessBaseAPI::TesseractRect (
const UINT8 * imagedata,
int bytes_per_pixel,
int bytes_per_line,
int left,
int top,
int width,
int height
)
Currently I am using the following code, but it also returns non-readable characters (although different ones than from the code above.
char* cr = tess.TesseractRect(
subImage.data,
subImage.channels(),
subImage.channels() * subImage.size().width,
0,
0,
subImage.size().width,
subImage.size().height);
c.)sample code here:
tesseract::TessBaseAPI tess;
cv::Mat sub = image(cv::Rect(50, 200, 300, 100));
tess.SetImage((uchar*)sub.data, sub.size().width, sub.size().height, sub.channels(), sub.step1());
tess.Recognize(0);
const char* out = tess.GetUTF8Text();
d.)the solution in details:
First, make a deep copy of your subImage, so that it will be stored in a coninuous memory block:
cv::Mat subImage = image(cv::Rect(50, 200, 300, 100)).clone();
Then, init a PIX headed (I don't know how) with the correct parameters.
// ???? Put your own constructor here.
PIX* pix = new PIX_HEADER(width, height, channels, depth);
OR, create it manually:
PIX pix;
pix.width = subImage.width;
...
Then set the pix data pointer to the subImage data pointer
pix.data = subImage.data;
Finally, make sure your subImage objects does not go out of scope before you finish your work with pix.
the solution provided is not yet achieved as linking the two(tesseract ocr-opencv)has become a challenge.
And so,have attached a image which gives u a clear idea about what kind of input is
The numbers are only marked for your ease understanding which are nothing but the location of the text available.
Suppose its a input image where i have to read the contents within the box.
If there is nothing just the rectangle with text, you can pass image to tesseract.
If there is something around rectangle with text (e.g. you want to ignore everything outside rectangle) you need to:
identify rectangle coordinates (with some opencv function or maybe with GetConnectedComponents from tesseract api)
use SetRectangle after SetImage from tesseract api
then how to code that.am not using any of the python tesseract library as in my case programming language used is c++.
It would be of great help if somebody comes up within the snippet to achieve this.
Do you mean something like this?
#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>
#include <opencv2/opencv.hpp>
int main() {
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
if (api->Init("/usr/src/tesseract-ocr/", "eng")) {
fprintf(stderr, "Could not initialize tesseract.
");
return 1;
}
IplImage *img = cvLoadImage("/home/user/sampleimage.png");
if ( img == 0 ) {
fprintf(stderr, "Cannot load input file!
");
return 1;
}
api->SetImage((unsigned char*)img->imageData, img->width,
img->height, img->nChannels, img->widthStep);
// be aware of tesseract coord systems starting at left top corner!
api->SetRectangle(129, 184, 484, 108);
char* outText = api->GetUTF8Text();
printf("OCR output:
");
printf(outText);
api->Clear();
api->End();
delete [] outText;
delete api;
cvReleaseImage(&img);
return 0;
}