aspell-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [aspell-devel] Big wordlist and affix lexicons


From: Kevin Atkinson
Subject: Re: [aspell-devel] Big wordlist and affix lexicons
Date: Mon, 27 Nov 2006 13:44:42 -0700 (MST)

On Mon, 27 Nov 2006, Børre Gaup wrote:

Láv, skábmamánu 25. b. 2006 13.52, Kevin Atkinson ÿÿálii:
On Sat, 25 Nov 2006, Børre Gaup wrote:
The problem is that hunspell is not as ubiquitous as aspell. As far as I
have seen hunspell is not commonly used, but aspell is used both in Linux
and in Mac OS X (through Cocoaspell). Hunspell is _intended_ to replace
myspell in openoffice.org (according to it's homepage).

What features in hunspell would you specifically like to have in aspell?

Possibly:

- Max. 65535 affix classes and twofold affix stripping

I had a brief look at hunspell documentation of their dictionaries and affix
files. As far as I understand twofold affix stripping means that you
can "stack" two different affixes after one another, or in other words, make
one affix point to another, just the way one word points to an affix in an
aspell dictionary.

En example from sami, the verb muitalit (to tell)

in present tense it has for example these forms:
muital  -an
                -at
                -a
                -edne
                -eahppi
                -eaba
                -ehpet

Behind each of these forms it is legal to add the
clitcs: -ge, -ba, -bat, -go, -son, -han, and a few more.

So in current aspell we would have to have both the -an and -an+clitics form
in the affix file, but if it had twofold affix stripping we could just point
to verb suffixes to point to the clitics, is that correct?

Yes

We also have verbs where the stem changes. Diehtit (to know) is an example
(same tense and form as above):

dieÿÿ           -án
                -át
dieht   -á
diht            -e
dieht   -ibeahtti
                -iba
                -ibehtet

Is there a way to tell that the three forms of stems are in fact the same word
to aspell so that we can handle them as one form, instead of three? Or would
some of the features mentioned below be of any help for this phenomen?

I honestly don't know.   Sorry.

- Handling conditional affixes, circumfixes, fogemorphemes, forbidden
   words, pseudoroots and homonyms.

- Support complex compoundings

I believe some of these will benefit you.

However I only want to implement them if these is a clear benefit to it.
For example based on what several people have told be complex compounding
rules are not worth it.

Aspell is far more complex then Myspell and each feature needs to
implemented carefully so that it will behave sensibly with the
suggestion code.  Also it is important that the addition of the
feature won't degrade performance, Especially when the feature isn't used.

Perhaps some of these features could be plugins, where different languages
load different plugins, according to their needs?

These are not the type of things that can easily pluggable. Its better that they get integrated into the core.
reply via email to

[Prev in Thread] Current Thread [Next in Thread]