bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] wget2 Feature Suggestion - Triston Line


From: Tim Rühsen
Subject: Re: [Bug-wget] wget2 Feature Suggestion - Triston Line
Date: Sat, 14 Jul 2018 15:09:51 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.0

Hi Triston,

On 13.07.2018 18:52, Triston Line wrote:
> Hi Tim,
> 
> Excellent answer thank you very much for this info, "-N" or
> "--timestamping" sounds like a much better way to go, however if I'm
> converting links, using wget (1) I think I've read somewhere and noticed
> that two separate commands running in series wouldn't be able to
> continue due to the links from the previous session/command-instance?
> More clearly, I've read that the primary reason continuing from a fault
> is impossible is due to the fact that converting links to mirror isn't
> something that can be continued and the links are only valid for that
> session. Sounds silly to me because you're just formatting <a href> tags
> from my understanding but there's probably a bit more to it. 

Well, the links/URLs in the converted file are adapted to your local
directory structure (relative). Depending on the wget's directory
options that are in use, you cannot reconstruct the original URLs.

What we would need is some metadata for each file downloaded, e.g. the
original URL, the referrer URL, ...

We already have such data (see --xattr option) since a while - *if* your
filesystem supports it. So we *could* use this metadata if possible.

That would be a new feature to be implemented.

> I have used max-threads in the past and I've tried a suggestion for
> xargs on one of the stack exchange forums, so I do toy with those
> settings while testing out my friend's servers at UBC. Government on the
> other hand I might get in a bit of trouble if I'm loading them during
> working hours (Gosh knows I don't wanna come in at some ungodly hour
> (e.g. 3 am) with the network-services team to toy around with their
> stuff at different sites or perform intranet backups around different
> sites from my local). 
> 
> " The server then only sends payload/data if it has a newer version of
> that document, else it responds with 304 Not Modified." This is 400
> Bytes to respond with the last modification date of a file?

No, we send the GET request with the local file's timestamp. If the
server has a newer version, it sends it together with a 200 OK, else it
sends 304 Not Modified with an empty body.

Just give it a try. If you see, everything is re-downloaded, stop and
try again with '-N --no-if-modified-since'. This makes wget to send a
HEAD request first - and depending on the timestamp info - wget
eventually creates a GET request thereafter (or nothing if the local
file is up-to-date).

But even the HEAD method can fail if the server sends wrong timestamps.
I saw servers sending always the current date instead of the files date
(e.g. true for dynamic / on-the-fly generated web pages).

Regards, Tim

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]