bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: is it a bug?


From: Eric Blake
Subject: Re: is it a bug?
Date: Tue, 02 Mar 2010 06:11:03 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.23) Gecko/20090812 Thunderbird/2.0.0.23 Mnenhy/0.7.6.666

According to Voelker, Bernhard on 3/2/2010 1:34 AM:
> I understand that the sort order depends on the locale, i.e. LC_ALL,
> but this doesn't explain the differences I get on Solaris 5.10, SLES 10.1,
> and Cygwin (given that sort didn't change about this point in the past).

The difference is that all three use different locale installations.

> 
> # === Solaris SunOS 5.10, sort 6.10 ===
> $ printf "ru.unix /h\nru.unix.ftn /h\nru.unix.prog /h" | LC_ALL=C sort
> ru.unix /h
> ru.unix.ftn /h
> ru.unix.prog /h
> $ printf "ru.unix /h\nru.unix.ftn /h\nru.unix.prog /h" | LC_ALL=POSIX sort
> ru.unix /h
> ru.unix.ftn /h
> ru.unix.prog /h

C and POSIX are strictly identical, on all machines.  If they ever behave
differently from one another, on the same machine, or when comparing two
machines, then you have found a bug and should report it to that vendor.

> $ printf "ru.unix /h\nru.unix.ftn /h\nru.unix.prog /h" | LC_ALL=en_US sort
> ru.unix /h
> ru.unix.ftn /h
> ru.unix.prog /h

That just means that Solaris' rules for en_US don't ignore punctuation.
You can use locale(1) to learn more about the collation rules that will be
selected when you enable that locale.

> # === SLES 10.1, kernel 2.6.16.60-0.23-smp, sort 5.93 ===
> $ printf "ru.unix /h\nru.unix.ftn /h\nru.unix.prog /h" | LC_ALL=en_US sort
> ru.unix.ftn /h
> ru.unix /h
> ru.unix.prog /h

Yep, glibc's locale installation ignores punctuation for en_US.  And
glibc's locale installation is probably the most complete one out there.

> $ sort --version
> sort (GNU coreutils) 5.93

Time to consider upgrading - the latest stable version is 8.4, and there
have been some bugs fixed in sort in the meantime.

> # === Cygwin on XPSP3, CYGWIN_NT-5.1 1.7.1(0.218/5/3), sort 7.0 ===
> $ printf "ru.unix /h\nru.unix.ftn /h\nru.unix.prog /h" | LC_ALL=en_US sort
> ru.unix /h
> ru.unix.ftn /h
> ru.unix.prog /h

Yep, cygwin 1.7.1 silently treats all LC_COLLATE in the C locale
(basically, no one had implemented the internals to convert the windows
notion of collation over to the POSIX api); it will improve for cygwin
1.7.2.  But cygwin is still different than glibc; it only supports locales
known to windows, rather than the glibc approach of letting you install
your own locales to a specific directory.

> It seems that sort doesn't depend on LC_ALL on Solaris and Cygwin,
> but it does on Linux. Besides LC_ALL, what does the sort order depend
> on? Build settings?

LC_ALL takes precedence.  But if LC_ALL is unset, then it is up to
LC_COLLATE; and if that is unset, then LC_LANG; and if that is unset, then
it is system-specific.

-- 
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]