[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] WARC output
From: |
Giuseppe Scrivano |
Subject: |
Re: [Bug-wget] WARC output |
Date: |
Wed, 10 Aug 2011 10:57:24 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux) |
Gijs van Tulder <address@hidden> writes:
> It would be cool if Wget could become one of these tools. Already the
> Swiss army knife for mirroring websites, the one thing that Wget is
> missing is a good way to store these mirrors. The current output of
> --mirror is not sufficient for archival purposes:
Sure we do!
> With some help from others, I've added WARC functions to Wget. With
> the --warc-file option you can specify that the mirror should also be
> written to a WARC archive. Wget will then keep everything, including
Can you please track all contributors? Any contribution to GNU wget
requires copyright assigments to the FSF.
> Do you think this is something that could be included in the main Wget
> version? If that's the case, what should be the next step?
Sure, I will take a look at the code in the next days. In the
meanwhile, can you check if you are following the GNU Coding Standards
for the new code[1]?
> The implementation makes use of the open source WARC Tools library
> (Apache License 2.0):
> http://code.google.com/p/warc-tools/
how much code is really needed from that library? I wonder if we can
avoid this dependency at all.
Cheers,
Giuseppe
1) http://www.gnu.org/prep/standards/