[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Aspell-user] small bug soundslike and non-ascii
From: |
Pablo Saratxaga |
Subject: |
[Aspell-user] small bug soundslike and non-ascii |
Date: |
Fri, 21 Oct 2005 13:29:40 +0200 |
User-agent: |
Mutt/1.5.6i |
Kaixo!
I discovered that soundslike just handles ASCII only; and converts
any non-ascii to some ascii value.
In most cases of existing *_phonet.dat it doesn't matters; but
in some cases it does.
French and Walloon are na example of that.
For example, "c" and "ç" are very different,
"ca" sounds "KA", but "ça" sounds "SA";
however, current phonet code handles "c" and "ç" just the same;
as a result, "ça" is viewed as sounding "KA" too...
another example is "e" vs "ê,é,è".
At the end of a word, "e" (without accent) is always mute,
eg: "livre" => "LIVR"
but not if it is accented, eg: "livré" => LIVRE
as a result, it is impossible to define some usefull soundslike
rules if they involve non-ascii chars in the language.
(I think also that it makes it impossible to defined soundslike rules
for languages for wich non-ascii letters are even more proeminent,
or even exclusively used; like Czeck, Esperanto, Russian,...)
the idea of matching fully accented chars with "ascii only" versions
is however a good one, but the match could involve several chars
(eg: "ö" -> "oe" in German, and not "ö" -> "o");
the possibility to define an "asciification" table could help
find the better suggestions when spell checking an unaccented
ascii-only text; that is particularly true for those languages
that, for lack of proper computer support, had been written in
ascii for a long time, like Esperanto and Romanian for example.
thanks
--
Ki ça vos våye bén,
Pablo Saratxaga
http://chanae.walon.org/pablo/ PGP Key available, key ID: 0xD9B85466
[you can write me in Walloon, Spanish, French, English, Catalan or Esperanto]
[min povas skribi en valona, esperanta, angla aux latinidaj lingvoj]
pgpRghnxA_uH7.pgp
Description: PGP signature
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Aspell-user] small bug soundslike and non-ascii,
Pablo Saratxaga <=