[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] Standards fix for metadata records in WARC files

From: Gijs van Tulder
Subject: [Bug-wget] Standards fix for metadata records in WARC files
Date: Fri, 12 Apr 2013 23:49:32 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130329 Thunderbird/17.0.5

This patch repairs two minor problems in the WARC metadata records.

1. Each record should have its own unique WARC-Record-ID, but currently the ID for the record holding the manifest is reused for the record holding the arguments. The patch generates a new ID for the arguments (and refers to the manifest in a WARC-Concurrent-To header).

2. According to the WARC implementation guidelines [1], the manifest should be written to a "metadata" record, but Wget stores it as a "resource" record. The patch corrects this.



[1] Section 2.4.4 of http://www.netpreserve.org/resources/warc-implementation-guidelines-v1

Attachment: warc-metadata-standards-fix.patch
Description: Text Data

reply via email to

[Prev in Thread] Current Thread [Next in Thread]