bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Problem with untranslated 8bit msgids


From: Guido Flohr
Subject: Problem with untranslated 8bit msgids
Date: Sun, 21 Aug 2005 15:49:16 +0300
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050511

Hi,

I have the following program:

        #include <locale.h>
        #include <errno.h>

        int
        main (int argc, char* argv[])
        {
                setlocale (LC_ALL, "");
                errno = ENOENT;
                perror (gettext ("Datei öffnen"));
                return 0;
        }

The program source, notably the string argument to gettext() is encoded in utf-8 and I assume here that gettext() does not find a translation for the string "Datei öffnen" (German for "open file").

As long as I run the program in a utf-8 locale on a utf-8 terminal there are no problems.

In a iso-8859-1 locale, however, the output is messed up, since the msgid is return unmodified, not converted to the correct character set for my locale:

        $ LANG=fr_FR; export LANG
        $ locale charmap
        ISO-8859-1
        $ ./l10ntest
        Datei öffnen: Aucun fichier ou répertoire de ce type

The German o with diaresis is encoded in utf-8, whereas the French e with accent aigue is correctly converted to iso-8859-1.

I produced this example with GNU libc 2.4.3, but I think standalone gettext-runtime will show the same behavior: Untranslated strings are passed through unmodified in the original character set from the source code, whereas translated strings are converted to the character set of the selected locale.

A possible fix depends on our ability to determine the msgid character set. Evaluating po headers (eventually fed with character set information from xgettext --from-code) is not an option; the example shows, that there maybe is no mo file at all that can be sourced. In other cases, multiple mo files with possibly conflicting header information could be sourced.

On the other hand, the above example is perfectly legal usage, and using non-English non-ASCII msgids is no longer deprecated. I can only see two possible solutions:

1) Only msgids encoded in UTF-8 are supported.

2) A new function bind_textdomain_input_codeset is introduced, allowing the programmer to specify the character set of the msgids in the program. If the function is not called, no default will be assumed, and therefore no output conversion on msgids done.

Option 1 has backwards compatibility issues, I prefer option 2.

Regards,
Guido
--
Imperia AG, Development
Leyboldstr. 10 - D-50354 Hürth - http://www.imperia.net/

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]