aspell-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Aspell-user] Unicode


From: Lars Aronsson
Subject: [Aspell-user] Unicode
Date: Mon, 27 Nov 2006 23:16:38 +0100 (CET)

I'm running Ubuntu Linux 6.06 that comes with Aspell 0.60.4.  I 
see two problems related to Unicode.  This system uses UTF-8 by 
default, and I'm trying to leave ISO 8859-1 behind all together.

1. I'm trying to create my own master dictionary.  Is it 
impossible to have the word list in utf-8? Section 7.1 of the web 
documentation seems to say so,
http://aspell.sourceforge.net/man-html/The-Language-Data-File.html

2. The output from "aspell -l sv dump master" is in broken utf-8.
If the command is prefixed with LC_CTYPE=iso8859-1 and the output 
is piped through "recode l1..u8", all is fine.  But without this, 
aspell's dump command converts to UTF-8 but truncates the words.
For example, in the 5 letter word "själv" the middle letter 
a-umlaut is coded in UTF-8 as two bytes (octal 0303 0244), but the 
output string is truncated to 5 bytes: "s", "j", "\0303", "\0244", 
"l" and the last "v" is missing.


-- 
  Lars Aronsson (address@hidden)
  Aronsson Datateknik - http://aronsson.se




reply via email to

[Prev in Thread] Current Thread [Next in Thread]