bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uniq Bug


From: Eric Blake
Subject: Re: uniq Bug
Date: Wed, 28 Jun 2006 00:34:30 +0000

> Not sure if this has already been discovered, but I found a problem with 
> uniq. If I sat down and looked a the code, I could probably see how to 
> fix it. It seems to always occur with very large unsorted streams (files).
> 
> Below are the commands I ran to exploit the bug (which I originally 
> thought was my error). Sorting the stream before removing duplicate 
> lines is inconsistent with just removing duplicate lines:

Thanks for the report.  However, uniq only works on sorted streams.  By
definition, uniq only looks at consecutive lines, to see if they are identical.
If the file is not sorted, then the same line might appear twice.  And
changing this would make slow uniq down (either requiring more
memory or more time to keep a list of all previously seen unique lines),
not to mention violating POSIX.

> Note that srv_inodes.txt as generated is about 70 thousand inode 
> numbers. I've attached this file.

That was a little presumptuous of you - this is a public mailing list,
and you just blasted 150k of data that means very little to a large
number of recipients.  Usually it is better to reduce your test case
to something that fits in the body of your message.

-- 
Eric Blake




reply via email to

[Prev in Thread] Current Thread [Next in Thread]