bug-ocrad
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-ocrad] Technical documentation summary readme.txt, page skew, Hough


From: Chris K. Skinner
Subject: [Bug-ocrad] Technical documentation summary readme.txt, page skew, Hough transform.
Date: Thu, 23 Feb 2006 12:35:05 -0500

This will be long, so please be patient and see if you can read all of this ...

I'm interested in various aspects of computer science including image processing, neural networks,
expert systems, semantic analysis.  I've got books on such topics.

As you are probably aware, there are software patents in some countries.

If you had some kind of outline of the algorithms that were applied per version of the software that would greatly help someone new coming in fresh off the street to gain a quicker understanding of stuff in general, and probably demonstrate to the world at large that you have invented something new that could not be patented / stolen / claimed by some greedy corporate dudes.

I have just downloaded and tried to compile your source, but it failed. Now to understand your work, I have to look into each source to try to realize what is being done now. To see what was done in previous versions that either did not work as intended or was tried and abandoned, I would have to repeat this analysis on your older version sources, and compare it to the current version.

Do you have any design notes, bibliographic citations, web links to information that you've made use of , release notes for what algorithms are being used / abandoned.

I've been using the following OCR software since about 1993:
WinFax, Calara WordScan Plus, Caere OmniPage.

From my experience with these, the amount of OCR errors goes way up if the
page skew (orientation, angle, rotation, rotate) is not exactly aligned to zero degrees. When the angle is off, then the bounding boxes around each page element, each column, each line of text, each character is wrongly positioned to create huge amounts of recognition errors. Consider that when a high resolution scan is done that recognition should probably improve because the information is rich with nice amounts of redundant information clues as to what is present on the page.

But the long horizontal lines of text then become very long "sets of stripes of pixels." With such long stripes, it is more likely that instead of there being a one or two pixel error from page skew, it can be much higher. If the recognition algorithms do not account for this, and instead determine bounding box regions for recognition too early and presume a zero page skew angle error, the results shall be/are very bad.

In the J. R. Parker book w/CD ROM "Algorithms For Image Processing And Computer Vision" that I have read, the author provides a couple of algorithm suggestions for combating the page skew angle issue. A Hough-transform when applied to the dots of the bottoms of the bounding boxes of glyphs results in a page skew angle in degrees (with his source code, that is). By applying an image rotation that eliminates the skew, better recognition shall result. (Unfortunately, he does not, however, present the source code for determining the bounding boxes of glyphs so that it is not easy to demonstrate that this algorithm will work especially on larger regions of text.)

Another approach is to use angle-independent Complex-Number-Coefficient Neural Networks to use as feature recognizers. The Japanese promoter of these neural networks says that they are Affine-Transform insensitive, and thereby can recognize a pattern that has been so transformed.
http://mathworld.wolfram.com/AffineTransformation.html
http://www.google.ca/search?num=20&hl=en&newwindow=1&safe=off&q=Affine
" http://www.google.ca/search?num=20&hl=en&newwindow=1&safe=off&q=Affine+Complex-Number+Coefficient+Neural+Networks " This too is just a theory. I don't have a copy of any books on Complex-Number-Coefficient Neural Networks, or any source code from a competent mathematician who has converted the advanced mathematics into working C++ code examples. Often these theoreticians are not interested in the practical applications of their work and are more interested in the expressions of their ideas as continuous functions expressed as N-dimensional differential equations (or something much less understandable to me anyway).

Thanks for any help that you could provide me in helping understand your project so that I might possibly provide you with suggestions for improvements.

Kindest regards, C.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]