aspell-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Aspell-user] Aspell quality measurement (was: Arabic)


From: Kevin Atkinson
Subject: Re: [Aspell-user] Aspell quality measurement (was: Arabic)
Date: Sat, 15 Apr 2006 15:49:00 -0600 (MDT)

On Sat, 15 Apr 2006, Lars Aronsson wrote:

Mohammed Sameer wrote:
On Mon, Apr 10, 2006 at 01:12:39PM +0200, Lars Aronsson wrote:
I know aspell works for Swedish, but I'm not convinced that it
is any better than ispell for Swedish.  I don't have any test
case to determine the quality of its function.

I still don't know anything about the soundsalike things, Still
have a long way to go :-)

Has anybody (for any language) developed a test suite or quality
measurement for the sounds-alike functionality?  How can we know
if aspell is any better (or how much better) than ispell?

Do we have any statistics (for different languages) on what the
common spelling and typing errors (typos) are?


For english see http://aspell.net/test/.

I have plenty of statistics on common OCR errors ("scannos") for
the Scandinavian languages.  In "Project Runeberg" (runeberg.org)
I maintain raw OCR text files under RCS version control as
volunteers are proofreading them online.  "Distributed
Proofreaders" (pgdp.net) do the same for many languages.  Making a
wdiff (word difference) between the original and final text
produces a list of the corrections made (and thus of the errors).

Common typos and scannos are not very useful stats since that is not what soundslike are used for. If those are the only mistakes soundslike are not needed at all. What I need are true spelling errors where the user really doesn't know how to spell the word.

Could Wikipedia's version history be used for spelling error
statistics?  Has anybody tried this?  Can OpenOffice and other
word processing software be made to report which corrections are
made?

For Aspell the personal replacement list contains a list of misspelling and corrections where the correct word wasn't the first on the list. This file is usually ~/.aspell.<lang>.prepl. See
  http://aspell.sourceforge.net/man-html/Notes-on-Storing-Replacement-Pairs.html
This information could be collected and used. But I have never really pursued it.

Wikipedia's version history might be useful if you can figure out how to get stats from it. What ever strategy you use it is important that it is able to pull out true spelling errors and not just typos.

OpenOffice doesn't use Aspell. But sure it can be used to gather stats. But I would first be happier if it used Aspell.







reply via email to

[Prev in Thread] Current Thread [Next in Thread]