[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: locale_charset() on MacOS X
From: |
Bruno Haible |
Subject: |
Re: locale_charset() on MacOS X |
Date: |
Fri, 27 Jan 2012 02:14:53 +0100 |
User-agent: |
KMail/4.7.4 (Linux/3.1.0-1.2-desktop; KDE/4.7.4; x86_64; ; ) |
Paul Eggert reported:
> <http://lists.gnu.org/archive/html/bug-bison/2012-01/msg00107.html>.
Akim Demaille wrote:
> I'm sending this message to you as the main author of
> the quotearg module. I am not sure which component should
> be considered guilty here, but the problem is:
>
> - independently of any LC_*, localcharset.c returns UTF-8
> on OS X.
>
> - If I instrument localcharset.c, I can see that the OS
> returns "US-ASCII" as locale_codeset.
>
> - localcharset's get_charset_aliases then maps US-ASCII
> to UTF-8 ...
>
> - so quotearg decides to use nice UTF-8 quotes (since
> quote.c asks for locale-dependent quotes).
>
> - so the test suite fails since it expects plain old "'".
>
> What module would be considered faulty here?
The test suite is faulty.
Rationale:
- The localcharset.c code is meant to return the character encoding
in the current locale. Pretty much like nl_langinfo(CODESET), except
that the latter is botched on many systems: on some it returns
non-standard encoding names such as "646", on some an empty string,
and on some (such as Cygwin or MacOS X) it returns "US-ASCII" when
in reality the character encoding is different.
localcharset.c can be seen as an override of nl_langinfo (CODESET),
except that it does not (yet) have the form of a gnulib-style override.
- POSIX [1] does not specify the character encoding of the "C" locale.
It could be US-ASCII or any extension of it, such as ISO-8859-1 or
UTF-8.
- On MacOS X the Terminal.app's encoding and the general text encoding
are UTF-8.
- On MacOS X nearly all users are working in the "C" locale. If a user
has told the OS that he's working in the French locale, the OS does
not set LC_* variables to indicate this, nor does the user usually
do so (why should he? he has already specified it once). Therefore
the normal situation on MacOS X is this:
$ env | grep LC_
$ locale
LANG=
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=
- gettext() takes care to transliterate messages to the locale encoding.
If locale_charset() is "UTF-8", 'rm --help' will show for a French
user
Usage: rm [OPTION]... FICHIER...
Supprime (défait le lien) les FILE(s).
...
and for a Chinese user
用法:rm [选项]... 文件...
Remove (unlink) the FILE(s).
If locale_charset() is "US-ASCII", 'rm --help' will show instead:
Usage: rm [OPTION]... FICHIER...
Supprime (d'efait le lien) les FILE(s).
and for a Chinese user no translation at all:
Usage: rm [OPTION]... FILE...
Remove (unlink) the FILE(s).
- quotearg's use of gettext() and locale_charset() to determine whether
to use ‘...’ instead of '...' is entirely appropriate, because
1. In situations where gettext() is known to make use of non-ASCII
characters in its resulting strings, it is also OK for quotearg
to make use of such characters.
2. quotearg is not used in places where POSIX demands a certain
result in the "C" locale.
In <http://lists.gnu.org/archive/html/bug-bison/2012-01/msg00091.html>
Akim also wrote:
> I had never realized that the tests are not specifying LC_ALL=C
> and they should. But even when I do, I still have nice quotes.
Indeed there is a slight difference in behaviour between gettext()
and locale_charset(): Setting the environment variable LC_ALL=C
disables all translations in gettext() - this is needed so that some
coreutils programs can be POSIX compliant -, whereas locale_charset()
doesn't have this special code.
There are several systems with locale encoding UTF-8 in the all user
locales: Plan 9, BeOS, Haiku, MacOS X, Cygwin 1.7, and there will be more,
because it's a natural choice nowadays. In such environments, it makes
less and less sense to assign the US-ASCII encoding to the "C" locale.
US-ASCII encoding was a good choice for the "C" locale between 1996-2001,
as a transition between the ISO-8859-1 world and the UTF-8 world. It isn't
any more.
Let's fix the testsuites.
Paul Eggert wrote:
> Does the following gnulib patch fix things for Bison on OS X?
> I'll CC: this to address@hidden, to give Bruno Haible
> a heads-up about the localcharset problem.
>
> localcharset: port to Mac OS X's C locale
> * lib/localcharset.c (get_charset_aliases) [DARWIN7]:
> Map "US-ASCII" to "ASCII". Problem reported by Akim Demaille in
> diff --git a/lib/localcharset.c b/lib/localcharset.c
> index d86002c..68ccf60 100644
> --- a/lib/localcharset.c
> +++ b/lib/localcharset.c
> @@ -262,6 +262,7 @@ get_charset_aliases (void)
> "ISO8859-9" "\0" "ISO-8859-9" "\0"
> "ISO8859-13" "\0" "ISO-8859-13" "\0"
> "ISO8859-15" "\0" "ISO-8859-15" "\0"
> + "US-ASCII" "\0" "ASCII" "\0"
> "KOI8-R" "\0" "KOI8-R" "\0"
> "KOI8-U" "\0" "KOI8-U" "\0"
> "CP866" "\0" "CP866" "\0"
Nah. "Let's break gettext() based internationalization of all GNU programs
for most MacOS X users" won't get my approval.
Bruno
[1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html
section 7.2
[2] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/df.html
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, (continued)
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Akim Demaille, 2012/01/25
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Paul Eggert, 2012/01/25
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Akim Demaille, 2012/01/26
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Akim Demaille, 2012/01/26
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Paul Eggert, 2012/01/26
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Akim Demaille, 2012/01/27
- Re: [PATCH 10/11] quote consistently and make tests pass with new quoting from gnulib, Paul Eggert, 2012/01/27
- Re: locale_charset() on MacOS X,
Bruno Haible <=
- Re: locale_charset() on MacOS X, Hans Aberg, 2012/01/27
[PATCH 02/11] maint: get gpl-3.0 from gnulib, Jim Meyering, 2012/01/18
Re: syntax-check, update bootstrap, update gnulib, Joel E. Denny, 2012/01/18