bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Patch: Always surround the "WARC-Target-URI" value with a


From: Tim Rühsen
Subject: Re: [Bug-wget] Patch: Always surround the "WARC-Target-URI" value with angle brackets
Date: Sat, 04 Mar 2017 12:55:39 +0100
User-agent: KMail/5.2.3 (Linux/4.9.0-2-amd64; KDE/5.28.0; x86_64; ; )

Thanks, Bejamin,

your patch is applied (trivial, no FSF copyright assignment required).

Regards, Tim

On Freitag, 3. März 2017 09:00:57 CET Benjamin Esham wrote:
> Hello,
> 
> When producing WARC files, Wget records the requested URI in the
> "WARC-Target-URI" field. I noticed that Wget encloses the value of this URI
> within <angle brackets> in blocks with "WARC-Type: request", but not those
> with types of "response", "resource", "revisit", or "metadata". Enclosing
> URIs within angle brackets is required by the spec [1]. I'm attaching a
> patch that adds the angle brackets for all block types.
> 
> (Doing this for "request" blocks was the subject of bug 47281 [2], which was
> fixed almost exactly a year ago. My patch simply extends the use of the
> warc_write_header_uri function to the other appropriate places.)
> 
> Here is a truncated example of the output from Wget 1.19.1:
> 
>     WARC/1.0
>     WARC-Type: response
>     WARC-Record-ID: <urn:uuid:95D7B77A-C019-4E91-9BBB-7526B68864F2>
>     WARC-Warcinfo-ID: <urn:uuid:29F863DF-B273-498B-B91C-B50B2FD1BFCD>
>     WARC-Concurrent-To: <urn:uuid:EDCAF84C-D7A6-43CE-AE78-AEE16D3B7F4B>
>     WARC-Target-URI: https://www.gnu.org/software/wget/
> 
> And from the patched version:
> 
>     WARC/1.0
>     WARC-Type: response
>     WARC-Record-ID: <urn:uuid:54F2170C-C3FA-4B05-A8B1-116466D92401>
>     WARC-Warcinfo-ID: <urn:uuid:29BCF957-0D4D-4933-9CA3-F7FF2218D144>
>     WARC-Concurrent-To: <urn:uuid:61FCAFA4-5DF9-4CC0-A6C6-BC233601EF1E>
>     WARC-Target-URI: <https://www.gnu.org/software/wget/>
> 
> Best regards,
> 
> Benjamin
> 
> 
> [1] http://bibnum.bnf.fr/WARC/WARC_ISO_28500_version1_latestdraft.pdf
> 
> [2] http://savannah.gnu.org/bugs/?47281

Attachment: signature.asc
Description: This is a digitally signed message part.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]