aspell-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [aspell-devel] Big wordlist and affix lexicons


From: Børre Gaup
Subject: Re: [aspell-devel] Big wordlist and affix lexicons
Date: Mon, 27 Nov 2006 14:52:23 +0100
User-agent: KMail/1.9.5

Láv, skábmamánu 25. b. 2006 13.52, Kevin Atkinson čálii:
> On Sat, 25 Nov 2006, Børre Gaup wrote:
> > The problem is that hunspell is not as ubiquitous as aspell. As far as I
> > have seen hunspell is not commonly used, but aspell is used both in Linux
> > and in Mac OS X (through Cocoaspell). Hunspell is _intended_ to replace
> > myspell in openoffice.org (according to it's homepage).
> >
> > What features in hunspell would you specifically like to have in aspell?
>
> Possibly:
>
> - Max. 65535 affix classes and twofold affix stripping
>

I had a brief look at hunspell documentation of their dictionaries and affix 
files. As far as I understand twofold affix stripping means that you 
can "stack" two different affixes after one another, or in other words, make 
one affix point to another, just the way one word points to an affix in an 
aspell dictionary.

En example from sami, the verb muitalit (to tell)

in present tense it has for example these forms:
muital  -an
                -at
                -a
                -edne
                -eahppi
                -eaba
                -ehpet

Behind each of these forms it is legal to add the 
clitcs: -ge, -ba, -bat, -go, -son, -han, and a few more.

So in current aspell we would have to have both the -an and -an+clitics form 
in the affix file, but if it had twofold affix stripping we could just point 
to verb suffixes to point to the clitics, is that correct?

We also have verbs where the stem changes. Diehtit (to know) is an example 
(same tense and form as above):

dieđ            -án
                -át
dieht   -á
diht            -e
dieht   -ibeahtti
                -iba
                -ibehtet

Is there a way to tell that the three forms of stems are in fact the same word  
to aspell so that we can handle them as one form, instead of three? Or would 
some of the features mentioned below be of any help for this phenomen?

> - Handling conditional affixes, circumfixes, fogemorphemes, forbidden
>    words, pseudoroots and homonyms.
>
> - Support complex compoundings
>
> I believe some of these will benefit you.
>
> However I only want to implement them if these is a clear benefit to it.
> For example based on what several people have told be complex compounding
> rules are not worth it.
>
> Aspell is far more complex then Myspell and each feature needs to
> implemented carefully so that it will behave sensibly with the
> suggestion code.  Also it is important that the addition of the
> feature won't degrade performance, Especially when the feature isn't used.

Perhaps some of these features could be plugins, where different languages 
load different plugins, according to their needs?

regards,
--
Børre Gaup




reply via email to

[Prev in Thread] Current Thread [Next in Thread]