bug-ocrad
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-ocrad] Not recognizing obvious text


From: Antonio Diaz Diaz
Subject: Re: [Bug-ocrad] Not recognizing obvious text
Date: Tue, 24 Jan 2006 16:18:02 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.7.12) Gecko/20050923

Tony Maro wrote:
There's not a way to limit the area of
the page you're doing OCR on is there?  Like a zone ocr?

Ocrad is able to do layout analysis and process only one of the resulting blocks. But this is likely to be slower and less reliable than cropping a part of the page as you are doing now.

I have in the to do list for ocrad a "crop" option, and I am in fact so impressed with what you are doing that I think I am going to implement it in the next version (0.15).

The syntax could be like this:
`ocrad file.pbm --crop left,top,right,bottom'

and the meaning of left, top, right, bottom could be:
between 0.0 and 1.0, a fraction of the whole page
greater than 1, a coordinate.

So `ocrad file.pbm --crop 0.0,0.0,0.5,0.5' would process the upper left quadrant, and `ocrad file.pbm --crop 0,0,500,500' would process the upper left 500x500 pixel square.


Bet you guys never thought ocrad would be used for that, eh? ;-)

Really no. :-)


So, anyone have an idea that might speed up the process?

I think the proposed "crop" option would speed up the process a lot, because ocrad could do the cropping first, then the rotation. All without creating intermediate files.


Yes, you read that right.  I'll be processing as much as
150,000 pages per day on one server, and am designing this process so it
could be clustered to handle more.

Some day you have to tell me who are you working for. ;-)


Regards,
Antonio.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]