bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18893: Bug with Gnu sort program in coreutils 8.4


From: Eric Blake
Subject: bug#18893: Bug with Gnu sort program in coreutils 8.4
Date: Wed, 29 Oct 2014 16:36:23 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0

tag 18893 notabug
thanks

On 10/29/2014 03:58 PM, Michael Yang wrote:

> There might be a bug in the “sort” program in GNU coreutils 8.4, present at
> least in CentOS 6 x86_64.  It’s not immediately obvious to me whether or
> not this bug has been reported before.

Thanks for the report.  However, it has been frequently reported, to the
point that it has a FAQ entry:

https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021

> sort (GNU coreutils) 8.4 yields:
> 
> 
> 
> CC = aCC
> 
> CC = cc
> 
> CCFLAGS =
> 
> CC = gcc

You can use the --debug flag to see what is going on (well, you can when
using new enough sort; 8.4 is rather old these days, and while there
HAVE been sort bug fixes in the meantime, they are for rather obscure
corner cases and not for your issue).

$ printf 'CC = aCC\nCC = cc\nCCFLAGS =\nCC = gcc\n' | sort --debug
sort: using ‘en_US.UTF-8’ sorting rules
CC = aCC
________
CC = cc
_______
CCFLAGS =
_________
CC = gcc
________

I'm guessing that on your CentOS box, your locale is set to en_US.UTF-8,
or some similar locale which collates case-insensitively and ignores
punctuation.  In such a collation sequence, you are comparing 'ccflags'
vs. 'ccgcc', and the final output order is correct.

> … the 3rd line is out-of-order.  In comparison, sort (GNU coreutils) 8.14
> in cygwin yields:

The version of sort makes no difference; rather, it is entirely up to
the locale (and by the way, cygwin now ships with 8.23, so you may want
to upgrade); on your cygwin box, I'm guessing that you are using the C
locale.  And even if you are using the en_US locale there, you must
remember that the cygwin locale definitions come from Windows, not
glibc, and therefore may differ in what the two locale writers thought
would make sense (that is, while the glibc en_US locale ignores
punctuation, maybe the Windows en_US locale does not).  At any rate, on
your CentOS box, you can force the C locale to get the same behavior as
cygwin seemed to give by default:

$ printf 'CC = aCC\nCC = cc\nCCFLAGS =\nCC = gcc\n' | LC_ALL=C sort --debug
sort: using simple byte comparison
CC = aCC
________
CC = cc
_______
CC = gcc
________
CCFLAGS =
_________

Therefore, I'm closing this as not a bug, but feel free to respond if
you have further comments or questions.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]