bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Makes sort create random order


From: Frederik Eaton
Subject: Re: [PATCH] Makes sort create random order
Date: Sat, 29 Jan 2005 01:22:29 -0800
User-agent: Mutt/1.5.6+20040907i

> > I think few people would care about this corner case.
> 
> Maybe, maybe not; it's a bit hard to tell without knowing why
> people need the option to "sort at random".

I've given many examples - can you give an example of a situation
where people would put (a) differently-formatted numbers in a column
of a file (how would they become differently-formatted?) and then sort
randomly based on their values, (b) insisting that ties stay together?

> Let's put it a different way.  Suppose we have a program that simply
> generates as output a random permutation of its input lines.  Would
> that suffice?

It would, in many ways, and such programs do exist.

> If so, perhaps we should simply create a new "permute" program rather
> than folding its functionality into "sort"; that would fold better
> into the software tools philosophy that "sort" is part of.

I disagree. First of all, the functionality belongs in sort. Should we
have a separate program to sort numerically? To sort based on months?
No. These features are in 'sort' for a reason. When you want to
rearrange lines of a file, you turn to one command - sort.
Furthermore, you might want to sort on one key and randomize on
another (e.g., play songs in albums in order, but select the albums
randomly). Having the functionality in separate programs makes this
kind of task difficult to do. Secondly, IIRC, none of the existing
programs handle large files as well as 'sort' does. On the other hand,
some of them are much faster than 'sort', but this doesn't matter when
they fail on your 5G dataset...

As mentioned in previous threads, this feature is frequently asked
for. Patches have been submitted and discussed. I just discovered that
a friend of mine uses a version of sort at his job with this feature
hacked in for financial analysis. It's in the TODO list. So I think we
should go ahead and decide to implement it in some form or another.

> If not, then I would like to understand the needs better before
> writing or reviewing code.
>
> >> > As for the nature of the investigations, well, anything for which
> >> > one needs a random permutation, I suppose. Also, random sampling
> >> > with sort -R | head, though somewhat inefficient, but convenient,
> >> 
> >> But these uses should not attempt to sort ties together.  They should
> >> attempt to sort them separately.
> >
> > Hmm, I don't see any of these uses as involving duplicate elements.
> 
> I can.  One might need a random sampling of a collection of elements,
> some of which are identical to each other.
> 
> > If they did, it would become impossible to determine exactly which
> > elements were sampled, or exactly what your permutation was.
> 
> That's OK in many applications.  (You have 30 black balls and 20 white
> balls in an urn, and want to select 7 balls without replacement....)

OK, after some thought I agree with you. Do you think it would be too
confusing to have both alternatives available? "Sort based on a hash
of the key" and "sort based on a random virtual key". Also, now I
wouldn't mind implementing both, if that's what it takes.

Frederik




reply via email to

[Prev in Thread] Current Thread [Next in Thread]