[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] wget 1.14 possibly writing off-spec warc.gz files
From: |
Andy Jackson |
Subject: |
[Bug-wget] wget 1.14 possibly writing off-spec warc.gz files |
Date: |
Fri, 29 Mar 2013 21:33:33 +0000 (UTC) |
User-agent: |
Loom/3.14 (http://gmane.org/) |
When using wget 1.14 to generate warc.gz files, e.g.
wget -O tempname --warc-file="output" "http://example.com"
the files this creates do not play back well using the Internet Archives
warc.gz parsers, throwing errors like
"Invalid FExtra length/records".
It appears wget may be creating slightly malformed GZIP skip-length
fields - see
https://github.com/ukwa/warc-discovery/issues/1
for details.
It's likely that we'll need to make the warc.gz parsers a bit more
robust, but I thought I'd mention it here in case this is
actually a bug in wget.
Thanks for your time.
Andy Jackson
- [Bug-wget] wget 1.14 possibly writing off-spec warc.gz files,
Andy Jackson <=