bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18073: defect with sort multiple arguments


From: Eric Blake
Subject: bug#18073: defect with sort multiple arguments
Date: Mon, 21 Jul 2014 15:13:05 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

tag 18073 notabug
thanks

On 07/21/2014 01:57 PM, n buckner wrote:
> I was seeing some odd behaviour with sort -n -u.  I ran sort -n -u dataset
> and expected the same output as sort -n dataset| uniq but instead got
> something different.  sortbug is a script file showing the usage described
> above, dataset is the dataset.
> here is the version I am running.
> 
> sort (GNU coreutils) 8.21

Thanks for the report.  However, the problem is not in sort, but in your
usage of the command line parameters to sort.  Let's use the --debug
flag to see what is REALLY going on:

$ sort -n -u dataset --debug
sort: using ‘en_US.UTF-8’ sorting rules
2012-09-07 (Srikrishna Bodanapu
____
2013-06-15 (Chetana Nair
____
2014-02-24 (Subba Juturi
____

Aha - sort's -u says to declare lines unique ONLY if they differ on the
sort keys you specified, and disregarding any portion of the line that
didn't match your specified sort keys.  But the sort key you specified,
-n, ends as soon as it hits a non-numeric character.  If you WANT to
sort the entire line, then you need to do something like:

sort -k1,1n -k1 -u dataset

which says to sort _first_ by numeric (which ends on the first non-digit
character of each line), and _second_ by the entire line; and then
filter out for unique lines.  Adding the second key over the entire line
makes the difference that matches what you were seeing with uniq:

$ diff -u <(sort -k1,1n -k1 dataset -u) <(sort -n dataset | uniq)
$

Oh, and if you wanted to sort by all three fields of the date, instead
of just the year, you probably want:

sort -t - -k1,1n -k2,2n -k3,3n -k1 -u dataset

although for the particular dataset you posted, it makes no difference.

I'm closing this as not a bug, but please feel free to reply if you have
further questions.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]