[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: feature request: gzip/bzip support for sort
From: |
Philip Rowlands |
Subject: |
Re: feature request: gzip/bzip support for sort |
Date: |
Thu, 18 Jan 2007 22:38:49 +0000 (GMT) |
On Thu, 18 Jan 2007, Jim Meyering wrote:
I've done some more timings, but with two more sizes of input.
Here's the summary, comparing straight sort with sort --comp=gzip:
2.7GB: 6.6% speed-up
10.0GB: 17.8% speed-up
It would be interesting to see the individual stats returned by wait4(2)
from the child, to separate CPU seconds spent in sort itself, and in the
compression/decompression forks.
I think allowing an environment variable to define the compressor is a
good idea, so long as there's a corresponding --nocompress override
available from the command line.
$ seq 9999999 > k
$ cat k k k k k k k k k > j
$ cat j j j j > sort-in
$ wc -c sort-in
2839999968 sort-in
I had to use "seq -f %.0f" to get this filesize.
With --compress=gzip:
$ /usr/bin/time ./sort -T. --compress=gzip < sort-in > out
814.07user 29.97system 14:50.16elapsed 94%CPU (0avgtext+0avgdata
0maxresident)k 0inputs+0outputs (4major+2821589minor)pagefaults 0swaps
There's a big difference in the time spent on gzip compression depending
on the -1/-9 option (default -6). For a similar seq-generated data set
above, I get
gzip -1: User time (seconds): 48.63, output size is 6% of input
gzip -9: User time (seconds): 952.97, output size is 3% of input
Decompression time for both tests shows less variation (25s vs 21s).
This would suggest the elapsed time to sort can be improved by trading
compression ratio for less CPU time. Obviously a critical factor is the
disk latency.
Cheers,
Phil
- Re: feature request: gzip/bzip support for sort, (continued)
- Re: feature request: gzip/bzip support for sort, Dan Hipschman, 2007/01/24
- Re: feature request: gzip/bzip support for sort, Jim Meyering, 2007/01/25
- Re: feature request: gzip/bzip support for sort, Jim Meyering, 2007/01/16
- Re: feature request: gzip/bzip support for sort, Paul Eggert, 2007/01/16
- Re: feature request: gzip/bzip support for sort, Bauke Jan Douma, 2007/01/16
- Re: feature request: gzip/bzip support for sort, Dan Hipschman, 2007/01/16
- Re: feature request: gzip/bzip support for sort, Jim Meyering, 2007/01/18
- Re: feature request: gzip/bzip support for sort,
Philip Rowlands <=
- Re: feature request: gzip/bzip support for sort, Jim Meyering, 2007/01/18
- Re: feature request: gzip/bzip support for sort, Philip Rowlands, 2007/01/18
- Re: feature request: gzip/bzip support for sort, Dan Hipschman, 2007/01/16
- Re: feature request: gzip/bzip support for sort, James Youngman, 2007/01/16
- Re: feature request: gzip/bzip support for sort, Jim Meyering, 2007/01/18
- Re: feature request: gzip/bzip support for sort, Jim Meyering, 2007/01/18
- Re: feature request: gzip/bzip support for sort, Dan Hipschman, 2007/01/18
- Re: feature request: gzip/bzip support for sort, Dan Hipschman, 2007/01/18