[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Problem with untranslated 8bit msgids
From: |
Bruno Haible |
Subject: |
Re: Problem with untranslated 8bit msgids |
Date: |
Mon, 22 Aug 2005 17:19:29 +0200 |
User-agent: |
KMail/1.5 |
Guido Flohr wrote:
> Untranslated strings are
> passed through unmodified in the original character set from the source
> code, whereas translated strings are converted to the character set of
> the selected locale.
This is correct. It is also documented in the manual:
"Note that the MSGID argument to `gettext' is not subject to
character set conversion. Also, when `gettext' does not find a
translation for MSGID, it returns MSGID unchanged - independently of
the current output character set. It is therefore recommended that all
MSGIDs be US-ASCII strings."
> A possible fix depends on our ability to determine the msgid character
> set. Evaluating po headers (eventually fed with character set
> information from xgettext --from-code) is not an option; the example
> shows, that there maybe is no mo file at all that can be sourced.
Correct. You might get a translation from de.mo but access the metainfo
de_AT.mo, leading to inconsistencies.
> On the other hand, the above example is perfectly legal usage, and using
> non-English non-ASCII msgids is no longer deprecated.
Who said so? The GNU gettext manual recommends ASCII-only msgids, regardless
of the language.
> 1) Only msgids encoded in UTF-8 are supported.
>
> 2) A new function bind_textdomain_input_codeset is introduced, allowing
> the programmer to specify the character set of the msgids in the
> program. If the function is not called, no default will be assumed, and
> therefore no output conversion on msgids done.
>
> Option 1 has backwards compatibility issues, I prefer option 2.
You have also the following option, regardless whether your msgids were
in ISO-8859-15 or in UTF-8 originally:
3a. Change your po/Makefile so that the PO files are converted to UTF-8
just before being converted to a .mo file. For example, in
po/Makefile.in.in change
cd $(srcdir) && rm -f $${lang}.gmo && $(GMSGFMT) -c --statistics -o
t-$${lang}.gmo $${lang}.po && mv t-$${lang}.gmo $${lang}.gmo
cd $(srcdir) && rm -f $${lang}.gmo && msgcat -t UTF-8 $${lang}.po |
$(GMSGFMT) -c --statistics -o t-$${lang}.gmo - && mv t-$${lang}.gmo
3b. Use a wrapper function around gettext that does the conversion.
If your source character set was UTF-8:
char *my_gettext (const char *msgid)
{
char *translation = gettext (msgid);
if (translation == msgid)
translation = iconv_string (translation, "UTF-8", nl_langinfo
(CODESET));
return translation;
}
The iconv_string function is a convenience wrapper around iconv(),
found in gnulib.
If your source character set was ISO-8859-15:
char *my_gettext (const char *msgid)
{
char *utf8_msgid = iconv_string (msgid, "ISO-8859-15", "UTF-8");
char *translation = gettext (utf8_msgid);
if (translation == utf8_msgid)
translation = iconv_string (translation, "UTF-8", nl_langinfo
(CODESET));
return translation;
}
All this code ignores memory leak issues; take care yourself.
Bruno