bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new coreutil? shuffle - randomize file contents


From: Davis Houlton
Subject: Re: new coreutil? shuffle - randomize file contents
Date: Fri, 3 Jun 2005 07:16:33 +0000
User-agent: KMail/1.7.2

On Thursday 02 June 2005 10:31, Jim Meyering wrote:
> It sure sounds like shuffle and sort should share a lot of code,
> one way or another, so why not have them share the line- and key-
> handling code, too?  I won't rule out adding a new program, like
> shuffle, but I confess I'm less inclined now than when I started
> typing this message.

Now that I've heard of some samples, I think a sort --random is an excellent 
idea, and I hope its inclusion to coreutils occurs at some point.  There is 
scope in sort that is far beyond shuffle.  Assuming sort --random's eventual 
entry to coreutils is a given, I think it sounds like we have two questions 
to decide on--
 a) Implementation details aside, does a shuffle command merit entry?
 b) Examining implementation details, what is the best way to go?

My take is that shuffle is good for the lazy man--shuffle as is currently 
written, is destructive, replacing a file A with random(A). I intend (after I 
add -z and -o, of course :] )  to add a --head (-h) option. so that if all we 
want is the first line, that's all we have to process. My thought is that 
these properties make shuffle ideal for simple, quick hitters via "system" 
type calls in the various scripting languages that are commonly used.  

To help illustrate, here are the common use cases I envsion:

USE CASE 1: Randomizing file contents
 shuffle *
        ls * | xargs -l -ii sort --random "i" -o "i"
 
USE CASE 2: Grabbing a file at random
 ls * | shuffle -h 1
        ls * | sort --random | head -l 1

USE CASE 3: Generating a list of random files
  find . -name \*.mp3 | shuffle -o "playlist.m3u"
  find . -name \*.mp3 | sort --random -o "playlist.m3u"

Effiency wise, I think shuffle will run quicker, but that may not be an issue 
given the size of average cases (small). For question a) above, I'm thinking 
there is room.  In the same way we have grep -r and find . | xargs grep, I'm 
assuming we can have both shuffle and sort (from a users perspective).

If we assume that the potential exists for both sort and shuffle, the devil 
then becomes the details. How much of sort would exist in shuffle--or vice 
versa?  Should there be a gnu coreutils include that deals specifically with 
temp files, for use by any utility? 

Ahhh, questions questions...I'm not sure how we should approach it.   Is the 
answer to b) unknown at present? Trying to get our arms around the issue 
could lead to a great deal of analysis paralysis, though I'm always willing 
to try. If we agree that a) is a given, maybe we should just try and add the 
N-scale code to shuffle, with a parallel --random effort in sort?  Then we 
can operate in hindsight, refactoring and adjusting as neccessary. One 
potential plan at any rate. Of course, if we agree that shuffle should not be 
included, then no harm no foul either. But, as it sounds like sort --random 
is far from trivial, sometimes a bird in the hand??? Thoughts? 

Thanks,
  Davis




reply via email to

[Prev in Thread] Current Thread [Next in Thread]