bug#23113: alternatives: parallel gzip processes trash hard disks

bug-gzip

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23113: alternatives: parallel gzip processes trash hard disks

From:	John Reiser
Subject:	bug#23113: alternatives: parallel gzip processes trash hard disks
Date:	Sat, 2 Apr 2016 21:43:50 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0

Here are some other approaches which may help:

1. Use gzopen() from zlib to compress the 10GB file as it is generated.
This uses only one CPU core and requires sequential writing only
(no random writes) but that may be enough in some cases.

2. The output from gzip is written 32KiB at at time, so a large output file
involves growing the file many times.  Thus buffering the output from gzip
into larger blocks may help, too.  Try:
        gzip ...  |  dd obs=... of=...

3. Similarly, dd can buffer the input to gzip:
        dd if=... ibs=... obs=...  |  gzip ...

4. dd can also be used to create multiple streams of input
from a single file:
        (dd if=file ibs=... skip=0*N count=N obs=...  |  gzip ... ) &
        (dd if=file ibs=... skip=1*N count=N obs=...  |  gzip ... ) &
        (dd if=file ibs=... skip=2*N count=N obs=...  |  gzip ... ) &
        (dd if=file ibs=... skip=3*N count=N obs=...  |  gzip ... ) &
However dd does not perform arithmetic, so the multiplication j*N
must be given as a literal result.

The dd utility program is quite versatile!

[Prev in Thread]

Current Thread

[Next in Thread]

bug#23113: alternatives: parallel gzip processes trash hard disks, John Reiser <=

Next by Date: bug#23113: parallel gzip processes trash hard disks, need larger buffers
Next by thread: bug#23113: parallel gzip processes trash hard disks, need larger buffers
Index(es):
- Date
- Thread