[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] patch for writing cdx records
From: |
Christof Horschitz |
Subject: |
[Bug-wget] patch for writing cdx records |
Date: |
Wed, 22 Mar 2017 14:01:50 +0100 |
Hi,
attached you can find a patch that proposes a change to the file warc.c.
The change will use url_escape to escape reserved characters in the
redirect_location. Up to the current version (1.19) wget (with warc and
warc-cdx flags) will write the redirect_location unescaped. If that
contains whitespaces (e.g. unescaped error messages or oauth scope
information) it is nearly impossible to parse as wget uses whitespaces as
field separators.
The sample cdx writer published by internetarchive (
https://github.com/internetarchive/CDX-Writer) also uses url encoding on
the redirect_location.
Best Regards
Christof Horschitz
warc.c.patch
Description: Text Data
- [Bug-wget] patch for writing cdx records,
Christof Horschitz <=