|
From: | Kevin Atkinson |
Subject: | Re: [Aspell-user] Unicode |
Date: | Mon, 27 Nov 2006 20:38:31 -0700 (MST) |
On Mon, 27 Nov 2006, Lars Aronsson wrote:
I'm running Ubuntu Linux 6.06 that comes with Aspell 0.60.4. I see two problems related to Unicode. This system uses UTF-8 by default, and I'm trying to leave ISO 8859-1 behind all together. 1. I'm trying to create my own master dictionary. Is it impossible to have the word list in utf-8? Section 7.1 of the web documentation seems to say so, http://aspell.sourceforge.net/man-html/The-Language-Data-File.html
You can set the "data-encoding" to utf-8 in the language data file. But that also effects the default encoding used in files like the personal dictionary for all users of the dictionary.
2. The output from "aspell -l sv dump master" is in broken utf-8. If the command is prefixed with LC_CTYPE=iso8859-1 and the output is piped through "recode l1..u8", all is fine. But without this, aspell's dump command converts to UTF-8 but truncates the words. For example, in the 5 letter word "själv" the middle letter a-umlaut is coded in UTF-8 as two bytes (octal 0303 0244), but the output string is truncated to 5 bytes: "s", "j", "\0303", "\0244", "l" and the last "v" is missing.
This will be fixed in the next version. You can use CVS branch "rel_0_60-branch" or search for the bug report which should include the patch.
-- Lars Aronsson (address@hidden) Aronsson Datateknik - http://aronsson.se _______________________________________________ Aspell-user mailing list address@hidden http://lists.gnu.org/mailman/listinfo/aspell-user
[Prev in Thread] | Current Thread | [Next in Thread] |