Problem with untranslated 8bit msgids

bug-gnu-utils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Problem with untranslated 8bit msgids

From:	Guido Flohr
Subject:	Problem with untranslated 8bit msgids
Date:	Sun, 21 Aug 2005 15:49:16 +0300
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050511

Hi,

I have the following program:

        #include <locale.h>
        #include <errno.h>

        int
        main (int argc, char* argv[])
        {
                setlocale (LC_ALL, "");
                errno = ENOENT;
                perror (gettext ("Datei öffnen"));
                return 0;
        }

The program source, notably the string argument to gettext() is encodedin utf-8 and I assume here that gettext() does not find a translationfor the string "Datei öffnen" (German for "open file").

As long as I run the program in a utf-8 locale on a utf-8 terminal thereare no problems.

In a iso-8859-1 locale, however, the output is messed up, since themsgid is return unmodified, not converted to the correct character setfor my locale:


        $ LANG=fr_FR; export LANG
        $ locale charmap
        ISO-8859-1
        $ ./l10ntest
        Datei Ã¶ffnen: Aucun fichier ou répertoire de ce type

The German o with diaresis is encoded in utf-8, whereas the French ewith accent aigue is correctly converted to iso-8859-1.

I produced this example with GNU libc 2.4.3, but I think standalonegettext-runtime will show the same behavior: Untranslated strings arepassed through unmodified in the original character set from the sourcecode, whereas translated strings are converted to the character set ofthe selected locale.

A possible fix depends on our ability to determine the msgid characterset. Evaluating po headers (eventually fed with character setinformation from xgettext --from-code) is not an option; the exampleshows, that there maybe is no mo file at all that can be sourced. Inother cases, multiple mo files with possibly conflicting headerinformation could be sourced.

On the other hand, the above example is perfectly legal usage, and usingnon-English non-ASCII msgids is no longer deprecated. I can only seetwo possible solutions:


1) Only msgids encoded in UTF-8 are supported.

2) A new function bind_textdomain_input_codeset is introduced, allowingthe programmer to specify the character set of the msgids in theprogram. If the function is not called, no default will be assumed, andtherefore no output conversion on msgids done.


Option 1 has backwards compatibility issues, I prefer option 2.

Regards,
Guido
--
Imperia AG, Development
Leyboldstr. 10 - D-50354 Hürth - http://www.imperia.net/

signature.asc
Description: OpenPGP digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

Problem with untranslated 8bit msgids, Guido Flohr <=
- Re: Problem with untranslated 8bit msgids, Bruno Haible, 2005/08/22

Prev by Date: Re: On darwin, diff <(echo ab) <(echo cd) does nothing
Next by Date: Re: On darwin, diff <(echo ab) <(echo cd) does nothing
Previous by thread: On darwin, diff <(echo ab) <(echo cd) does nothing
Next by thread: Re: Problem with untranslated 8bit msgids
Index(es):
- Date
- Thread