aspell-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Aspell-user] Anybody working on Turkish? (Ethan Bradford)


From: ge
Subject: [Aspell-user] Anybody working on Turkish? (Ethan Bradford)
Date: Sat, 15 Jul 2006 13:01:04 +0200

Do you know this?
https://zemberek.dev.java.net/

Regards: Eleonora


Unless somebody else is nearly there for Turkish, Gokalp and I will probably
be working on improving Aspell for Turkish (that is, we have been working on
it, and are just awaiting some administrivia to start working on it again).

We'd love to collaborate with anybody else interested in it, or to get
feedback on our approach.


Here's some background, and then our approach, if you are interested.

Turkish is an "agglutinative" language, like Finnish, Estonian, Hungarian,
Japanese, and Korean.  That means that suffixes convey a lot more
information than in Indo-european languages, and that any complete list of
"surface forms" of words has to be enormously longer.  Though the suffix
trees are big, they're quite regular, so it fits reasonably well into
Aspell's structure (though it fits better into Hunspell, but for various
reasons we can't go there).  There's a good implementation of Aspell for
Finnish which proves the concept.

We hope to take the existing Turkish Aspell word list, or maybe even a
longer word list, if we have time to generate it, and apply a stemmer to it
to come of with a list of the represented stem forms.  We'll connect those
up with tables of suffixes we've collected from the web.

Does that sound like it will work?






reply via email to

[Prev in Thread] Current Thread [Next in Thread]