bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uniq -c output


From: Bob Proulx
Subject: Re: uniq -c output
Date: Fri, 16 Apr 2004 23:39:01 -0600
User-agent: Mutt/1.3.28i

Christopher Ness wrote:
> I did not expect `uniq -c` to output multiple values for non-uniq input 
> based on the location of the input in the file.

Thank you for your report.  But what you are seeing is not a bug.  It
is the expected behavior of uniq.

> address@hidden nesscg]$ /bin/cat /tmp/mail-abuse/todays-list.txt | 
> /bin/awk '{ print $6;}' | /usr/bin/uniq -c
>       8 vlan270-029-228.maconline.McMaster.CA[130.113.29.228]:
>       4 vlan180-034-124.maconline.McMaster.CA[130.113.34.124]:
>       7 vlan270-029-228.maconline.McMaster.CA[130.113.29.228]:
> 
> I think the above is wrong.  The first and last should be rolled into one 
> entity and printed as 15 hits on that string.  At least thats what I 
> thought `uniq` was going to do but apparently it is aware of the location 
> of the strings.
> 
> I have had to call `sort` to work around this.  Is this the intended 
> output.  If so please put a note about this in the man page for `uniq`

Yes, that is the expected behavior.  If you want to sort the data then
you have to sort it first.  That is an optional step.  The unix
philosophy is of small modular programs which work together to create
more complex programs.

[My editorial remarks are that some other systems do little themselves
and mostly just launch commands.  In those systems every command
becomes a complete environment.]

I am hoping your use of fully qualified paths was only for the
example.  Hard coded paths are Evil.

The oldest documentation I have available is the V7 docs which say
this:

    Uniq reads the input file comparing adjacent lines.  In the normal
    case, the second and succeeding copies of repeated lines are
    removed; the remainder is written on the output file.  Note that
    repeated lines must be adjacent in order to be found; see sort(1).

In the GNU documentation it says this:

    By default, `uniq' prints the unique lines in a sorted file, i.e.,
    discards all but one of identical successive lines.  Optionally,
    it can instead show only lines that appear exactly once, or lines
    that appear more than once.

    The input must be sorted.  If your input is not sorted, perhaps
    you want to use `sort -u'.

You can access the online documentation using the 'info' command.  The
man page is created automatically from the 'uniq --help' output.  It
is really intended only as a quick option reference and includes a
pointer to the full manual.

    info uniq

Bob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]