coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question about uniq's treatment of spaces-only lines


From: Pádraig Brady
Subject: Re: Question about uniq's treatment of spaces-only lines
Date: Sat, 30 Jul 2022 13:25:34 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:98.0) Gecko/20100101 Thunderbird/98.0

On 29/07/2022 19:10, Sudarshan S Chawathe wrote:
In brief, uniq seems to treat lines containing only spaces differently
when given the -f 1 option (compared to when given -f 0 or no -f
option).  My question is: Is this behavior intentional (or is it a bug
in the implementation or docs)?  I find it difficult to reconcile with
my understanding of the docs.

In more detail, consider a file in.txt with the following contents
(which can be reconstructed based on the descriptive line if mangled by
mailers):

Next 5 lines have, resp., 1, 2, 1, 2, and 1, blanks:
Last line

Given this input, the output of 'uniq -u -f 1 in.txt' is different from
that of 'uniq -u in.txt' and 'uniq -u -f 0 in.txt'.  (With -f 1, the
blanks-only lines are all removed, but not so with the others.)

I tested the above originally on the uniq from coreutils 8.32 but later
also on 9.1.42 (built from the git sources I just pulled a short while
ago) and both versions exhibit the same behavior.

Regards,

More succinctly:

  $ printf '%s\n' first blah ' ' '  ' 'l ast' | uniq -f1
  first
  l ast

I.e. skipping one field will compare all but the 'l ast' line as equal.
This is operating as per the POSIX standard which states:

"Ignore the first fields fields on each input line when doing comparisons,
where fields is a positive decimal integer. A field is the maximal string
matched by the basic regular expression:

[[:blank:]]*[^[:blank:]]*

If the fields option-argument specifies more fields than appear on an input 
line,
a null string shall be used for comparison."

thanks,
Pádraig



reply via email to

[Prev in Thread] Current Thread [Next in Thread]