bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] WARC output


From: Gijs van Tulder
Subject: Re: [Bug-wget] WARC output
Date: Sat, 08 Oct 2011 21:51:31 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0) Gecko/20110923 Thunderbird/7.0

Hi Giuseppe and Ángel,

Thanks for looking at the patch. Yes, it's quite big. (I should mention that this was also not my intention to have this complete patch added into the wget repository; it is a first patch to see the differences.)

Ángel González writes:
> I don't think all those files are even remotely needed.
> I am seeing for instance, python files for creating warc interacting
> with curl.

True. The patch I have sent you contains the complete warc tools library, with lots of things that aren't really needed for this task.

I have looked at the C files and headers that are needed for wget. I think there are approximately 110 files, with a total size of 1.3 MB, that are actually used by the wget extension.

> Also, the patch seems to duplicate code (compare lines 337731-337810
> with 337944-338013 in the patch file). Surely that could be
> refactored?

That is also true. It has a reason: I tried to add the WARC bits with as few changes to the current wget code as possible. However, the structure of http.c and its gethttp function made it necessary to have bits of very similar (but not exactly duplicate) code.

It's certainly possible to refactor, but I think that to do that you'd also have to refactor large parts of gethttp and related methods.

Gijs



reply via email to

[Prev in Thread] Current Thread [Next in Thread]