bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#11006: But in sort or WAD?


From: Pádraig Brady
Subject: bug#11006: But in sort or WAD?
Date: Tue, 13 Mar 2012 12:43:03 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0

On 03/13/2012 12:29 PM, Eric Blake wrote:
> tag 11006 notabug
> thanks
> 
> On 03/13/2012 06:20 AM, Philipp Thomas wrote:
>> I got this bug report for coreutils 8.14:
>>
>> ----------------------------------------------------
>>
>> export LANG=en_US.UTF-8
>> { echo 16301 3.574885; echo 163 0.171036; } | sort
>>
>> Produces
>>
>> 16301 3.574885
>> 163 0.171036
>>
>>
>> which is incorrect.  The lines should be in the other order
>>
>> With "LANG=C" it works correctly.
>>
>> ----------------------------------------------------
>>
>> Is this really a bug or is this because of differing collating rules?
> 
> This is correct behavior, and not a bug in sort.  The use of LANG=C to
> switch the behavior is indeed intended, as the en_US.UTF-8 really does
> collate with punctuation and whitespace elided, where '163013' is before
> '163017'.  I suggest you point the original poster to the FAQ.
> https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

What Eric said is correct, but note it's the en_US locale rather than anything
UTF8 specific that is causing this:

$ { echo 16301 3.574885; echo 163 0.171036; } | LANG=en_US sort --debug
sort: using `en_US' sorting rules
16301 3.574885
______________
163 0.171036
____________

We were wondering about updating the --debug option to make this apparent,
though that was thought too invasive for the benefit provided.

The following confirms that the ' ' and '.' are discounted from the sort:

$ { echo 16301 3.574885; echo 163 0.121036; } | LANG=en_US sort --debug
sort: using `en_US' sorting rules
163 0.121036
____________
16301 3.574885
______________


Also note above that the whole line is compared.
If you want to compare only field 1 first:

$ { echo 16301 3.574885; echo 163 0.171036; } | LANG=en_US sort -k1,1 --debug
sort: using `en_US' sorting rules
163 0.171036
___
____________
16301 3.574885
_____
______________


Or only field 1 in isolation:

$ { echo 16301 3.574885; echo 163 0.171036; } | LANG=en_US sort -k1,1 -s --debug
sort: using `en_US' sorting rules
sort: leading blanks are significant in key 1; consider also specifying `b'
163 0.171036
___
16301 3.574885
_____

Or you can implicitly restrict to field 1 with a numeric sort like:

$ { echo 16301 3.574885; echo 163 0.171036; } | LANG=en_US sort -n --debug
sort: using `en_US' sorting rules
163 0.171036
___
16301 3.574885
_____

cheers,
Pádraig.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]