bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#9995: problem about sort -u -k


From: Eric Blake
Subject: bug#9995: problem about sort -u -k
Date: Tue, 08 Nov 2011 12:45:11 -0700
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110928 Fedora/3.1.15-1.fc14 Lightning/1.0b3pre Mnenhy/0.8.4 Thunderbird/3.1.15

On 11/08/2011 11:54 AM, Eric Blake wrote:
22:41:39#tp#~> /usr/local/bin/sort -u -k1,3 a
1 a q
1 a w
3 a w
22:41:48#tp#~> /usr/local/bin/sort -u -k3 a
1 a q
1 a w

Since you didn't tell us what output you were hoping to get, I can't
tell you the proper command line that would match your expected output.
Feel free to reply, even while this bug is closed, if you need more help
in getting the output you want.

I'll give a preemptive attempt at guessing what you meant, as well:

If you wanted to sort on just the third and subsequent fields, but then strip duplicate lines only if the entire line is duplicate, then you have to use two processes:

sort [-s] -k3 a | uniq

If you don't mind a two-key sort, where the primary key is the third and subsequent fields, but where the secondary key is the entire line so as to force sort -u to consider the entire line when determining uniqueness, then one process will do:

sort -u -k3 -k1 a

To see the difference, and remembering that sort -u implies sort -s, consider these contents for a:

$ cat a
1 a q
2 a q
1 a q
1 a w
3 a w
$ sort -u -k3 -k1 a
1 a q
2 a q
1 a w
3 a w
$ sort -s -k3 a | uniq
1 a q
2 a q
1 a q
1 a w
3 a w
$ sort -k3 a | uniq
1 a q
2 a q
1 a w
3 a w

That is, if the stable sort of just -k3 leaves identical lines that are not adjacent ("1 a q" in my example), then the separate uniq process won't filter them; while using sort -u with -k1 as the means to force the entire line as a secondary sort key loses the ability to leave identical lines separated by a distinct line. Likewise, omitting both -s and -u lets sort imply a last-resort -k1, at which point uniq sees the same line order as sort -u sees.

>> i read http://www.gnu.org/s/coreutils/manual/html_node/sort-invocation.html,
>> but got nothing about this.

Actually, it does - under the option -u, I see:

The commands sort -u and sort | uniq are equivalent, but this equivalence does not extend to arbitrary sort options. For example, sort -n -u inspects only the value of the initial numeric string when checking for uniqueness, whereas sort -n | uniq inspects the entire line. See uniq invocation.

--
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org





reply via email to

[Prev in Thread] Current Thread [Next in Thread]