bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uniq: missing option -W / --check-fields=N


From: Jim Meyering
Subject: Re: uniq: missing option -W / --check-fields=N
Date: Tue, 27 Jun 2006 14:51:21 +0200

Pádraig Brady <address@hidden> wrote:

> Jim Meyering wrote:
>>
>> Hi Matt,
>>
>> I'm glad you're willing to work on this.
>> It's an often-requested feature.
>> Unfortunately, the Debian -W patch was not acceptable.
>> It did not allow the same flexibility that sort does in
>> selecting keys.  To provide that, GNU uniq will eventually
>> accept at least the following options, just as sort does:
>>
>>   -k, --key=POS1[,POS2]     start a key at POS1, end it at POS2 (origin 1)
>>   -t, --field-separator=SEP  use SEP instead of non-blank to blank transition
>>   -z, --zero-terminated     end lines with 0 byte, not newline
>>
>> and even most, if not all, of these (for flexibility/interoperability
>> with sort, as well as to ease code sharing between uniq and sort):
>>
>>   -b, --ignore-leading-blanks  ignore leading blanks
>>   -d, --dictionary-order      consider only blanks and alphanumeric 
>> characters
>>   -i, --ignore-nonprinting    consider only printable characters
>
> agreed
>
>>   -f, --ignore-case           fold lower case to upper case characters
>
> It has this already. See below.
>
>>   -g, --general-numeric-sort  compare according to general numerical value
>>   -M, --month-sort            compare (unknown) < `JAN' < ... < `DEC'
>>   -n, --numeric-sort          compare according to string numerical value
>>   -r, --reverse               reverse the result of comparisons
>
> These 4 deal with specific order which I don't think uniq should worry about?

You're right about --reverse.  Thanks.

However, the others change sort's idea of which values are equal,
so they are relevant.  For -g, 0.0 == 0 == 00, etc.
For -M, FEB == feb == Feb, etc.
For -n, 00 == 0.

The idea is to be able to use uniq with the same keyspec options
as you used when sorting the data.
That means the command-line options listed above as well as the
key spec modifier options like b, d, g, M etc. used e.g., in -k 1b,1 -k 2n.

> uniq can be efficient and assume LANG=C always as
> it need only care if adjacent items match or not.
> Assuming LANG=C may be an issue for --ignore-case though?
> However I notice v5.2.1 at least only seems to handle ascii:
>
> $ LANG=ga_IE.utf8 uniq -i < Pádraig
> Pádraig
> PÁdraig

Yes, that's still a problem.
Would you like to work on it?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]