bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] wget 1.14 possibly writing off-spec warc.gz files


From: Andy Jackson
Subject: [Bug-wget] wget 1.14 possibly writing off-spec warc.gz files
Date: Fri, 29 Mar 2013 21:33:33 +0000 (UTC)
User-agent: Loom/3.14 (http://gmane.org/)

When using wget 1.14 to generate warc.gz files, e.g.

wget -O tempname --warc-file="output"  "http://example.com";

the files this creates do not play back well using the Internet Archives 
warc.gz parsers, throwing errors like 

"Invalid FExtra length/records". 

It appears wget may be creating slightly malformed GZIP skip-length 
fields - see 

https://github.com/ukwa/warc-discovery/issues/1 

for details.

It's likely that we'll need to make the warc.gz parsers a bit more 
robust, but I thought I'd mention it here in case this is 
actually a bug in wget.

Thanks for your time.

Andy Jackson




reply via email to

[Prev in Thread] Current Thread [Next in Thread]