aspell-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [aspell-user] Aspell and OCR-generated text


From: Kevin Atkinson
Subject: Re: [aspell-user] Aspell and OCR-generated text
Date: Fri Jul 12 16:10:23 2002

On Thu, 11 Jul 2002 address@hidden wrote:

> ... It would be much better, though, if aspell's
> algorithms were oriented toward the kinds of mistakes OCR engines make
> rather than the kinds made by human typists. 

Aspell algorithms are not really tuned for the type of mistakes made by
typists.  Rather they are tuned for the type of mistakes humans
(especially me) tend to make when trying to spell a word.  The typo
analysis in Aspell biases the result slightly, but it generally doesn't
make a huge difference.

> I can see how you might do this
> by working with the translation tables for the phonetic code, the keyboard
> files, etc. 

You will probably get the best results by turnings the soundslike analysis 
off all together.  Modifying the keyboard file will also help.   However, 
the best results will probably be from modifying the weights in 
TypoEditDistanceWeights found in util/typo_editdist.hh.  To do so will 
requiring modifying the code a bit.  The code that fills in the weight can 
be found in SuggestParms::fill_distance_lookup in lib/suggest.cc.  Most of 
the code should be self explanatory.  

You really need to understand how edit distance works in order to know 
what to modify.  The comments in the util/*editdist* files should give you 
enough information for this understanding.

--- 
http://kevin.atkinson.dhs.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]