[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Problem with untranslated 8bit msgids
From: |
Guido Flohr |
Subject: |
Problem with untranslated 8bit msgids |
Date: |
Sun, 21 Aug 2005 15:49:16 +0300 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050511 |
Hi,
I have the following program:
#include <locale.h>
#include <errno.h>
int
main (int argc, char* argv[])
{
setlocale (LC_ALL, "");
errno = ENOENT;
perror (gettext ("Datei öffnen"));
return 0;
}
The program source, notably the string argument to gettext() is encoded
in utf-8 and I assume here that gettext() does not find a translation
for the string "Datei öffnen" (German for "open file").
As long as I run the program in a utf-8 locale on a utf-8 terminal there
are no problems.
In a iso-8859-1 locale, however, the output is messed up, since the
msgid is return unmodified, not converted to the correct character set
for my locale:
$ LANG=fr_FR; export LANG
$ locale charmap
ISO-8859-1
$ ./l10ntest
Datei öffnen: Aucun fichier ou répertoire de ce type
The German o with diaresis is encoded in utf-8, whereas the French e
with accent aigue is correctly converted to iso-8859-1.
I produced this example with GNU libc 2.4.3, but I think standalone
gettext-runtime will show the same behavior: Untranslated strings are
passed through unmodified in the original character set from the source
code, whereas translated strings are converted to the character set of
the selected locale.
A possible fix depends on our ability to determine the msgid character
set. Evaluating po headers (eventually fed with character set
information from xgettext --from-code) is not an option; the example
shows, that there maybe is no mo file at all that can be sourced. In
other cases, multiple mo files with possibly conflicting header
information could be sourced.
On the other hand, the above example is perfectly legal usage, and using
non-English non-ASCII msgids is no longer deprecated. I can only see
two possible solutions:
1) Only msgids encoded in UTF-8 are supported.
2) A new function bind_textdomain_input_codeset is introduced, allowing
the programmer to specify the character set of the msgids in the
program. If the function is not called, no default will be assumed, and
therefore no output conversion on msgids done.
Option 1 has backwards compatibility issues, I prefer option 2.
Regards,
Guido
--
Imperia AG, Development
Leyboldstr. 10 - D-50354 Hürth - http://www.imperia.net/
signature.asc
Description: OpenPGP digital signature
- Problem with untranslated 8bit msgids,
Guido Flohr <=