bug-ocrad
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-ocrad] Feature Request for less noisy output


From: Uwe Dippel
Subject: [Bug-ocrad] Feature Request for less noisy output
Date: Fri, 25 May 2007 22:50:02 +0800

Here we have to do with plenty documents with lots of white space.
A usual headache are patches (lines) of streaky noises, in between valid
text lines.
I have now started to write a filter; using -x and then extracting the line
numbers from there, and store the line numbers with low height. Then I split
the OCR-ed text into its lines to purge those lines from the text.
Cumbersome, I thought. And then I had the impression that this might be done
much easier within ocrad; with an option, somewhat like
ocrad -h <height>
that simply suppresses the output of lines with a height of <height> or
lower.
In my humble opinion, the file created with -x should still show everything.
But text output would be much cleaner from dirt and dust, when any 'line' of
a height below a certain threshold is simply dropped when using this option.
I am well aware, that this would as well drop a straight, horizontal line,
though, but would not matter in our case. We fight much more with patches
and dots of dirt on the scanner surface, that usually screw up inter-line
white-space; adding dots, dashes and underscores into the text.

Uwe


reply via email to

[Prev in Thread] Current Thread [Next in Thread]