coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uniq for unsorted input


From: Bruno Haible
Subject: Re: uniq for unsorted input
Date: Wed, 27 Mar 2024 00:25:50 +0100

Pádraig Brady wrote:
> A documentation patch would be very useful.
> If you search for "Decorate Sort Undecorate" in coreutils.texi
> you can see an existing example of this DSU pattern.
> I would also mention DSU if adding another example.

OK, I'll provide doc input for this (post 9.5).

> Re adding new --keep-unsorted options I'm a bit reluctant
> as the scalability is not improved by bringing within the command.

The advantage of having this functionality in a coreutils
C program (as opposed to some shell script) is that it can
be optimized for both ends of the scalability spectrum:
  - An all-in-memory implementation, like the existing 'uniq',
    for inputs that easily fit in RAM (similar criteria as 'sort'
    uses, based on physmem()).
  - A scalable implementation, that does 3 or 4 fork+exec but
    is O(N log N), for bigger inputs.

Since I don't see 'uniq' options that could not be supported with
--keep-unsorted[-first] or --keep-unsorted-last, it would make
sense to extend the 'uniq' program, rather than to create another
program.

Bruno






reply via email to

[Prev in Thread] Current Thread [Next in Thread]