bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new coreutil? shuffle - randomize file contents


From: Davis Houlton
Subject: Re: new coreutil? shuffle - randomize file contents
Date: Mon, 30 May 2005 09:25:45 +0000
User-agent: KMail/1.7.2

Hi Frederik! I guess we're both a little confused :) My question is why would 
I sort AND shuffle in the same command? Are we talking sort the whole data 
set and shuffle a subset? I guess I'm having a hard time thinking why I would 
randomize via key--not saying that there aren't reasons, I'm just not sure 
what they are! 

My premise is that shuffle is organized pretty differently than sort--the code 
I have (in addition to the code I imagine we'll need for large files) looks 
radically different than sort, if only because shuffling is vastly simpler. 

While we could graft a shuffle into sort--I must admit to have only taken a 
cursory glance at the sort source--I think we can gain greater efficiencies 
by keeping the logic paths separate.  My assumption is thus the shuffling 
code will be it's own entity, whether it is in sort or shuffle.  

Looking at it a different way, lets take a look at the usage of sort and 
shuffle as a card metaphor.  The way I sort a deck of cards--and my rather 
simple method is far from optimum--is to first spread the cards face up out 
on a table, look for some high cards of each suit, start a pile of the four 
suits, and then as I pull additional cards, place them in the proper order in 
each suit pile. When I'm done sometime later, I'm left with the four stacks 
of cards, each suit in the proper order.  

When I shuffle the resulting deck, however, I use a different process. 
Granted, I could spread all the cards on the table, mix them up "domino" 
style, and then place them randomly into one, or even four stacks.  That 
would be acceptable.  But what I do (following the grand tradition of card 
shark wannabes everywhere) is split the deck in half.  I take each deck, and 
attempt to randomly merge them together like we've all seen those Las Vegas 
dealers do on tv, and voila--I have now (in theory) randomized the deck. It's 
quicker and just as effective as the table spread method.

If we are willing to ignore the imperfections of the analogy--that Vegas 
dealers shuffle their cards 7 times, that I have a tendency to mangle cards 
with improper shuffling technique, etc--my thinking is that it makes sense to 
have sort and shuffle remain separate on an intuitive level.  And I admit, it 
is true, it is not hard to train a user in sort and shuffle commands.  Had 
sort --random already existed, there would be no need to propose any 
separation. But if we accept as a given that the code will follow two 
different logic paths, I personally don't see maintenance gains from 
combining the two.  

I must admit, with the American holidays and family I'm pretty pressed for 
time. I took a quick scan of the archive and it seemed like the conclusion 
was it is a good idea to keep shuffle functionality separate? 

At any rate, I will add the delimiter, -o, -z options sometime in the future 
and then check the code into savannah for (I hope) scathing review.  While my 
personal feeling is that it could be a good addition to coreutils, at least 
this way it can be available to those who have a use for a simple, quick 
shuffle.


On Monday 30 May 2005 05:27, Frederik Eaton wrote:
> I'm not following exactly - in part I think it is premature to discuss
> implementation details now. And as for the idea to put "shuffle"
> functionality in a separate command, this and other issues were
> discussed at length in the previous thread which starts here:
>
> http://lists.gnu.org/archive/html/bug-coreutils/2005-01/msg00145.html
>
> Basically, sometimes you want to be able to sort on one key and
> shuffle on another, which means that your hypothetical 'shuffle' would
> have to have a superset of the 'sort' functionality. Not an ideal
> situation. Not only would the implementations be very similar, but,
> more importantly, the APIs would have a lot of overlap as well.
>
> Also, just because some users might look for "shuffle" functionality
> first in a "shuffle" command doesn't mean that we should put it there.
> You only have to learn that the functionality is provided by "sort"
> once, and it doesn't make sense to sacrifice too much usability or
> maintainability to try to captivate the small minority of users who
> are first-time users (as commercial software vendors tend to do).
> (Now, it also doesn't make sense to have to work out a line-long awk
> script every time you need to shuffle something)
>
> Frederik
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]