[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] wget 1.14 possibly writing off-spec warc.gz files
From: |
Andy Jackson |
Subject: |
Re: [Bug-wget] wget 1.14 possibly writing off-spec warc.gz files |
Date: |
Sat, 30 Mar 2013 21:18:42 +0000 (UTC) |
User-agent: |
Loom/3.14 (http://gmane.org/) |
Tim Rühsen <tim.ruehsen <at> gmx.de> writes:
> Unzipping it and zipping it again results in a 2387 byte file.
>
> So, for a first glimpse, it looks like Wget compresses very suboptimal.
> But I won't say it is a bug before I take a deeper look... (in the next days).
That's probably working as intended. By conventions, warc.gz files use
concatenated GZip records, rather than a single GZipped stream,
so that individual items can be recovered via their byte offset. This is
allowed by the GZip spec, but not widely known or used, which
causes much confusion. I rather wish the spec. had defined
some other file extension for this case.
Thanks,
Andy