bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: feature request: gzip/bzip support for sort


From: Dan Hipschman
Subject: Re: feature request: gzip/bzip support for sort
Date: Mon, 15 Jan 2007 12:33:04 -0800
User-agent: Mutt/1.5.9i

On Sat, Jan 13, 2007 at 10:07:59PM -0800, Paul Eggert wrote:
> 3.  I can see where the user might be able to specify a better
> algorithm, for a particular data set.  For that, how about if we have
> a --compress-program=PROGRAM option, which lets the user plug in any
> program that works as a pipeline?  E.g., --compress-program=gzip would
> use gzip.  The default would be to use "PROGRAM -d" to decompress; we
> could have another option if that doesn't suffice.
> 
> An advantage of (3) is that it should work well on two-processor
> hosts, since compression can be done in one CPU while sorting is done
> on another.  (Hmm, perhaps we should consider forking even if we use a
> built-in default compressor, for the same reason.)

I've started working on this, and have made good progress so far.  There
are a lot of subtleties, though, like making sure the forked child
doesn't receive SIGINT and unlink all our temp files before it execs
(I've solved that problem), and making sure the compress process
finishes compressing the temp file before the corresponding decompress
process starts processing it (I've got a plan for that).  Anyway, my
point is, I've gotten off to a good start, but it's going to take a lot
of testing to make sure I've done it right due to all these race
conditions.

The actual compression is obviously a lot better (using gzip / bzip2),
and it shouldn't be hard to extend the code so sort can read and write
externally compressed files, which is what the OP wanted.  It's not
faster (not even close) on my machine, though.  Of course, I've only got
one CPU, and a slow one at that :-)

Dan





reply via email to

[Prev in Thread] Current Thread [Next in Thread]