bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] patch for writing cdx records


From: Christof Horschitz
Subject: [Bug-wget] patch for writing cdx records
Date: Wed, 22 Mar 2017 14:01:50 +0100

Hi,

attached you can find a patch that proposes a change to the file warc.c.
The change will use url_escape to escape reserved characters in the
redirect_location. Up to the current version (1.19) wget (with warc and
warc-cdx flags) will write the redirect_location unescaped. If that
contains whitespaces (e.g. unescaped error messages or oauth scope
information) it is nearly impossible to parse as wget uses whitespaces as
field separators.

The sample cdx writer published by internetarchive (
https://github.com/internetarchive/CDX-Writer) also uses url encoding on
the redirect_location.

Best Regards
Christof Horschitz

Attachment: warc.c.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]