bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] wget 1.14 possibly writing off-spec warc.gz files


From: Andy Jackson
Subject: Re: [Bug-wget] wget 1.14 possibly writing off-spec warc.gz files
Date: Sat, 30 Mar 2013 21:18:42 +0000 (UTC)
User-agent: Loom/3.14 (http://gmane.org/)

Tim Rühsen <tim.ruehsen <at> gmx.de> writes:

> Unzipping it and zipping it again results in a 2387 byte file.
> 
> So, for a first glimpse, it looks like Wget compresses very suboptimal.
> But I won't say it is a bug before I take a deeper look... (in the next days).

That's probably working as intended. By conventions, warc.gz files use 
concatenated GZip records, rather than a single GZipped stream, 
so that individual items can be recovered via their byte offset. This is 
allowed by the GZip spec, but not widely known or used, which 
causes much confusion. I rather wish the spec. had defined 
some other file extension for this case.

Thanks,
Andy






reply via email to

[Prev in Thread] Current Thread [Next in Thread]