[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Gettext Mac GUI application returning wrong characters
From: |
Bruno Haible |
Subject: |
Re: Gettext Mac GUI application returning wrong characters |
Date: |
Sat, 11 Jul 2020 12:18:31 +0200 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-179-generic; KDE/5.18.0; x86_64; ; ) |
Hi Gonzalo,
> I compiled a sample case that does.
Based on this, I could adapt my test case and reproduce the issue.
Find it attached. Use:
$ ./configure --prefix=PREFIX; make; make install
$ mkdir -p hello.app/Contents/MacOS; ln -s PREFIX/bin/hello
hello.app/Contents/MacOS/hello
then double-click on hello.app in the Finder.
> I printed out the variable values and when run command line only
> LC_CTYPE=UTF-8 is set.
>
> When run from the GUI no values are set.
Yes, I reproduce this: in the Terminal, LC_CTYPE=UTF-8; when run from the
Finder, LC_CTYPE is unset.
In my test case, I added code to print the MB_CUR_MAX and locale_charset()
1. after setlocale(LC_ALL,""),
2. after std::locale::global(std::locale("")).
The result is:
1. MB_CUR_MAX=4, locale_charset()=UTF-8
2. MB_CUR_MAX=1, locale_charset()=ASCII
Here are the explanations:
* Although the text encoding on macOS generally is UTF-8, the locale
facility in libc by default - i.e. before the first setlocale() call, or
when setlocale(LC_ALL,"") is called and no LANG, LC_* environment
variable is set - sets MB_CUR_MAX = 1.
* When MB_CUR_MAX = 1, the functions like mbrtowc etc. cannot support UTF-8
encoding. For this reason, libintl and gnulib's locale_charset() function
returns "ASCII" in this case. See the code at the end of [1].
* In this case, the gettext facility uses iconv() to convert the strings to
ASCII. So, for example, "ñ" becomes "n~" or "~n". This is coded in [2],
function get_output_charset and its caller.
* Two workarounds exist, to make UTF-8 encoded translations appear
nevertheless:
- The Terminal app sets the environment variable LC_CTYPE=UTF-8.
- <libintl.h> contains a setlocale override that, on macOS, assumes
LC_CTYPE=UTF-8 even if it is not set. [3] line 1482.
* In the test case, we invoke the overridden setlocale from <libintl.h>.
This explains the output
1. MB_CUR_MAX=4, locale_charset()=UTF-8
* In the test case, then, the statement
std::locale::global(std::locale(""))
invokes setlocale(LC_ALL,"") - the original setlocale from libc, not the
overridden one. So, it annihilates the effect of the previous step.
Since none of the two workarounds is active, then, you get the
transliterated output.
The fix, now, is to add this code before std::locale::global(std::locale("")):
#if defined __APPLE__ && defined __MACH__
setenv ("LC_CTYPE", "UTF-8", 1);
#endif
and include <stdlib.h>.
Bruno
[1]
https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gettext-runtime/intl/localcharset.c
[2]
https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gettext-runtime/intl/dcigettext.c
[3]
https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gettext-runtime/intl/setlocale.c
hello-c++-0.tar.gz
Description: application/compressed-tar