Re: Problem with untranslated 8bit msgids

bug-gnu-utils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problem with untranslated 8bit msgids

From:	Bruno Haible
Subject:	Re: Problem with untranslated 8bit msgids
Date:	Mon, 22 Aug 2005 17:19:29 +0200
User-agent:	KMail/1.5

Guido Flohr wrote:
> Untranslated strings are
> passed through unmodified in the original character set from the source
> code, whereas translated strings are converted to the character set of
> the selected locale.

This is correct. It is also documented in the manual:

   "Note that the MSGID argument to `gettext' is not subject to
   character set conversion.  Also, when `gettext' does not find a
   translation for MSGID, it returns MSGID unchanged - independently of
   the current output character set.  It is therefore recommended that all
   MSGIDs be US-ASCII strings."

> A possible fix depends on our ability to determine the msgid character
> set.  Evaluating po headers (eventually fed with character set
> information from xgettext --from-code) is not an option; the example
> shows, that there maybe is no mo file at all that can be sourced.

Correct. You might get a translation from de.mo but access the metainfo
de_AT.mo, leading to inconsistencies.

> On the other hand, the above example is perfectly legal usage, and using
> non-English non-ASCII msgids is no longer deprecated.

Who said so? The GNU gettext manual recommends ASCII-only msgids, regardless
of the language.

> 1) Only msgids encoded in UTF-8 are supported.
>
> 2) A new function bind_textdomain_input_codeset is introduced, allowing
> the programmer to specify the character set of the msgids in the
> program.  If the function is not called, no default will be assumed, and
> therefore no output conversion on msgids done.
>
> Option 1 has backwards compatibility issues, I prefer option 2.

You have also the following option, regardless whether your msgids were
in ISO-8859-15 or in UTF-8 originally:

  3a. Change your po/Makefile so that the PO files are converted to UTF-8
      just before being converted to a .mo file. For example, in
      po/Makefile.in.in change

      cd $(srcdir) && rm -f $${lang}.gmo && $(GMSGFMT) -c --statistics -o 
t-$${lang}.gmo $${lang}.po && mv t-$${lang}.gmo $${lang}.gmo

      cd $(srcdir) && rm -f $${lang}.gmo && msgcat -t UTF-8 $${lang}.po | 
$(GMSGFMT) -c --statistics -o t-$${lang}.gmo - && mv t-$${lang}.gmo

  3b. Use a wrapper function around gettext that does the conversion.

      If your source character set was UTF-8:

      char *my_gettext (const char *msgid)
      {
        char *translation = gettext (msgid);
        if (translation == msgid)
          translation = iconv_string (translation, "UTF-8", nl_langinfo 
(CODESET));
        return translation;
      }

      The iconv_string function is a convenience wrapper around iconv(),
      found in gnulib.

      If your source character set was ISO-8859-15:

      char *my_gettext (const char *msgid)
      {
        char *utf8_msgid = iconv_string (msgid, "ISO-8859-15", "UTF-8");
        char *translation = gettext (utf8_msgid);
        if (translation == utf8_msgid)
          translation = iconv_string (translation, "UTF-8", nl_langinfo 
(CODESET));
        return translation;
      }

      All this code ignores memory leak issues; take care yourself.

Bruno

[Prev in Thread]

Current Thread

[Next in Thread]

Problem with untranslated 8bit msgids, Guido Flohr, 2005/08/21
- Re: Problem with untranslated 8bit msgids, Bruno Haible <=

Prev by Date: Re: errata about the book "Linux Programming By Example"
Next by Date: Re: grep bug?
Previous by thread: Problem with untranslated 8bit msgids
Next by thread: xgettext 0.14.1 (fedora core 2) spins forever processing file as -L tcl
Index(es):
- Date
- Thread