[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#22236: Not exactly a bug...
From: |
Assaf Gordon |
Subject: |
bug#22236: Not exactly a bug... |
Date: |
Fri, 25 Dec 2015 18:36:58 -0500 |
tag 22236 notabug
close 22236
thanks
Hello Todd,
> On Dec 25, 2015, at 13:37, Todd Shandelman <address@hidden> wrote:
[...]
> So it looks like that for chars, 'uniq' has options to compare only the first
> N chars, or *all but* the first N chars.
>
> Whereas for fields, 'uniq' has only the option to skip the first N fields,
> but has no corresponding option to compare *only* the first N fields.
>
> Why this lack of symmetry?
This lack of symmetry originates from the POSIX standard:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/uniq.html
Which codified the existing features at that time.
GNU Coreutils' uniq program have added few more features, and there is a
working plan to add the ability to use specific fields (
http://lists.gnu.org/archive/html/coreutils/2013-02/msg00082.html ,
http://lists.gnu.org/archive/html/coreutils/2013-09/msg00047.html ) but this
has not yet been integrated into the main program - perhaps in future versions.
> And what do I do when I need that missing functionality, to compare only an
> initial subset of fields in each line?
To print unique lines of specific fields you can use 'sort':
Example, given the following sample input file:
$ cat input.txt
1 A 10 x 100
5 B 14 z 104
2 A 11 x 101
3 B 12 y 102
4 B 13 z 103
Print only lines with unique values in columns 2 and 4:
$ sort -k2,2 -k4,4 -s -u input.txt
1 A 10 x 100
3 B 12 y 102
5 B 14 z 104
This can be extended to include as many fields as you need.
If the fields are consecutive, you can specify them as so:
$ cat input2.txt
A x 1 97
B x 1 96
A x 1 99
A x 1 98
$ sort -k1,3 -u input2.txt
A x 1 97
B x 1 96
regards,
- assaf