Re: [Bug 249431] gettext should convert untranslated non-ASCII msgids in

bug-gnu-utils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug 249431] gettext should convert untranslated non-ASCII msgids in

From:	Bruno Haible
Subject:	Re: [Bug 249431] gettext should convert untranslated non-ASCII msgids into locale encoding
Date:	Wed, 28 Feb 2007 13:15:10 +0100
User-agent:	KMail/1.5.4

https://bugzilla.novell.com/show_bug.cgi?id=249431

------- Comment #4 from address@hidden -------
The function gettext() is now standardized by the LSB, see
  
http://www.linux-foundation.org/spec/booksets/LSB-Core-generic/LSB-Core-generic/baselib-gettext.html
therefore there is not much room for changing its behaviour.

But anyway, what does it mean to support non-ASCII msgids?

  1) You must provide a way for handling the case that the user's locale
     encoding cannot represent the non-ASCII characters. Let's take the
     example from the BR you mentioned:
       https://bugzilla.novell.com/show_bug.cgi?id=248859
     (funny bug number, when it's about encodings, by the way :-))

       gettext ("St Dévote Day")

     The user certainly wishes to see "St Devote Day" then, i.e. the
     transliteration of glibc and libiconv could do it. But in more
     general cases, this does not work any more. So IMO the programmer
     must jump in and provide the ASCII equivalent too:

       gettext_na ("St Devote Day", "St Dévote Day")

  2) The argument to gettext is used as a key into the hash table in the .mo
     file. It's a hash table so that it's fast.

     If you use "St Dévote Day" as gettext argument, you have
       - to convert this string to the .mo file's text encoding first, using
         iconv(), [you cannot reliably enforce that all .mo files are in
         UTF-8 encoding],
       - to specify globally the source encoding, for example, through a new
         hypothetical function call
               bind_textdomain_source_codeset ("ISO-8859-1");

     If you use "St Devote Day" as gettext argument, you have none of these
     two problems.

  3) If gettext returns the msgid, i.e. if the message was not translated,
     you have to convert it to the locale encoding yourself. Again, you
     have to specify globally the source encoding, for example, through a
     hypothetical function call
               bind_textdomain_source_codeset ("ISO-8859-1");
     Additionally, you don't want a memory leak here, when the same call
     is made repeatedly. So you need to cache the result for later reuse.

You can get rid of the need for bind_textdomain_source_codeset if you
assume that the source code is in UTF-8. So what you end up with is a
user-defined function

/* Return the localization of a string whose original writing is not ASCII.
   MSGID_UTF8 is the real string, written in UTF-8 with octal or hexadecimal
   escape sequences.  MSGID_ASCII is a fallback written only with ASCII
   characters.  */

const char *
gettext_utf8 (const char *msgid_ascii, const char *msgid_utf8)
{
  /* See whether there is a translation.   */
  const char *translation = gettext (msgid_ascii);

  if (translation == msgid_ascii)
    {
      /* Access a cache here.  A little homework.  */

      /* locale_charset, c_strcasecmp, xstr_iconv are defined in gnulib.  */
      const char *locale_code = locale_charset ();
      if (c_strcasecmp (locale_code, "UTF-8") == 0)
        translation = msgid_utf8;
      else
        {
#if HAVE_ICONV
          const char *converted = xstr_iconv (msgid_utf8, "UTF-8", locale_code);
          if (converted == NULL)
            {
# if (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 2) || __GLIBC__ > 2 \
     || _LIBICONV_VERSION >= 0x0105
              size_t len = strlen (locale_code);
              char *locale_code_translit = XNMALLOC (len + 10 + 1, char);
              memcpy (locale_code_translit, locale_code, len);
              memcpy (locale_code_translit + len, "//TRANSLIT", 10 + 1);
              converted =
                xstr_iconv (msgid_utf8, "UTF-8", locale_code_translit);
              free (locale_code_translit);
# endif
              if (converted == NULL)
                {
                  /* Use msgid_ascii as a fallback.  */
                  converted = msgid_ascii;
                }
            }
          translation = converted;
#else
          translation = msgid_ascii;
#endif
        }
      /* Store the translation in the cache.  Homework part 2.  */
    }

  return translation;
}

Finally, you make this function available to xgettext by putting this into
po/Makevars:

XGETTEXT_OPTIONS = \
  --keyword=gettext_utf8:1 --flag=gettext_utf8:1:pass-c-format

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug 249431] gettext should convert untranslated non-ASCII msgids into locale encoding, by way of Bruno Haible <address@hidden>, 2007/02/28
- Re: [Bug 249431] gettext should convert untranslated non-ASCII msgids into locale encoding, Bruno Haible <=

Prev by Date: Re: compatibility between gawk 3.1.1 and 3.1.5 ?
Next by Date: [Bug 249431] gettext should convert untranslated non-ASCII msgids into locale encoding
Previous by thread: [Bug 249431] gettext should convert untranslated non-ASCII msgids into locale encoding
Index(es):
- Date
- Thread