[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-ocrad] Not recognizing obvious text
From: |
Antonio Diaz Diaz |
Subject: |
Re: [Bug-ocrad] Not recognizing obvious text |
Date: |
Tue, 24 Jan 2006 16:18:02 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.7.12) Gecko/20050923 |
Tony Maro wrote:
There's not a way to limit the area of
the page you're doing OCR on is there? Like a zone ocr?
Ocrad is able to do layout analysis and process only one of the
resulting blocks. But this is likely to be slower and less reliable than
cropping a part of the page as you are doing now.
I have in the to do list for ocrad a "crop" option, and I am in fact so
impressed with what you are doing that I think I am going to implement
it in the next version (0.15).
The syntax could be like this:
`ocrad file.pbm --crop left,top,right,bottom'
and the meaning of left, top, right, bottom could be:
between 0.0 and 1.0, a fraction of the whole page
greater than 1, a coordinate.
So `ocrad file.pbm --crop 0.0,0.0,0.5,0.5' would process the upper left
quadrant, and `ocrad file.pbm --crop 0,0,500,500' would process the
upper left 500x500 pixel square.
Bet you guys never thought ocrad would be used for that, eh? ;-)
Really no. :-)
So, anyone have an idea that might speed up the process?
I think the proposed "crop" option would speed up the process a lot,
because ocrad could do the cropping first, then the rotation. All
without creating intermediate files.
Yes, you read that right. I'll be processing as much as
150,000 pages per day on one server, and am designing this process so it
could be clustered to handle more.
Some day you have to tell me who are you working for. ;-)
Regards,
Antonio.