bug-gzip
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: fixing the 32-bit size and time limits in gzip file format


From: Greg Roelofs
Subject: Re: RFC: fixing the 32-bit size and time limits in gzip file format
Date: Fri, 20 Aug 2010 12:18:28 -0700

> On Aug 19, 2010, at 2:54 PM, Paul Eggert wrote:

>> In that case I'm afraid that we need to give up on the goal of always
>> providing a correct uncompressed length.  At this point the gzip
>> format is so widely used that an incompatible change to it would cause
>> far more trouble than the relatively minor problem of gzip -l
>> reporting the wrong length.  Instead, it might be better to leave the
>> format alone, and to change gzip -l so that it decompresses the data
>> in order to report the uncompressed data length.

That would be a significant performance hit, as Mark also noted.  At least
in the file (as opposed to streaming) case, there are some alternatives:

 - Add a suboption (-ll? -l2? ) to do decompression-listing.

 - If just -l and compressed size is more than 250 MB, warn that the
   uncompressed size might be off by a multiple of 2^32 and note the
   secondary option.

 - If the compressed size is more than 4 GB, perhaps force decompression-
   listing, or else make the warning more definite (maybe even an error).

Of course, I didn't even know there was a -l option, so feel free to
ignore me...

Mark wrote:

> So I'm thinking we should put forth a format amendment.
> However initially only decoding the format would be supported,
> not creating it.  We would let that simmer for, say, three years
> for the updated gzip, pigz, and zlib to free range.  Then let
> loose versions that create the format once the decoding has a
> wide distribution.

Of course, some third-party apps will undoubtedly start writing the new
format even before the updated standard is ratified...

> I used to think that eventually this would all go away since
> the gzip format would surely be supplanted by something else.
> However even with .bz2 and .xz formats with better compression,
> that simply hasn't happened.  So now I'm beginning to think
> that the .gz format will stubbornly persist at least until I die
> (at which point I won't care anymore).

Yup.  gzip is an excellent compromise between speed (both sides) and
compression ratio.  bzip2 is completely pointless these days (-1 same
as gzip -9 but much slower, -9 same as xz -1 or -2 but slower), and
xz offers a decent improvement in compression ratio at close to an
order-of-magnitude hit in compression speed and 2-3x hit in decompression
speed (relative to gzip, that is).

> So how would a format amendment be put forth?  As an addition
> to the existing RFC?  As a new RFC?  Who knows how to do that?

I believe updated RFCs appear as new numbers--the e-mail format (RFC-822?)
got updated at least once or twice, IIRC.  I'm pretty sure the procedure
is documented either in an RFC or somewhere on IANA's web site, but Glenn
is the one who dealt with most of that, as I recall.  I'll cc him (not
that I'm volunteering him for anything, of course!).

> By the way, I keep getting bounce messages from Peter's address,
> so I don't think he's getting these emails.

Yup, he's gone.  I've removed him from the cc list.

Greg



reply via email to

[Prev in Thread] Current Thread [Next in Thread]