[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] extra HEAD request for dealing with redirects (302)

From: Laurent C
Subject: [Bug-wget] extra HEAD request for dealing with redirects (302)
Date: Wed, 21 Mar 2012 19:28:08 -0400


Thanks a lot for writting wget, it's been very helpful!

I'm using wget to recursively retrieve certain types of documents
using the accept list (-A).  To know wether or not to retrieve a file
wget looks at the extension of the file contained in the link (which
makes sense). However, if that link turns out to be a redirection,
using an http 302 code, to a document that does match the accept list
then I would like to retrieve it. I understand that wget cannot know
about this unless it does an extra http request.

E.g., I'm looking for .ps files (wget -r -A ps URL). I get a link
like: "serve_doc.cgi?file=foo.ps". serve_doc.cgi does not match the -A
option, but that link is really a redirect to a link with a ps

A simple solution would be to add serve_doc.cgi to my -A list. However
in my application I don't know of such filenames before I first
encounter them.

Currently is there a way to ask wget to retrieve such files? I
searched around and could not find much about this.

I could think of 2 options that would work in this case:
1) Outside of wget: look at the debug output of wget and manually
check the header of links that wget reject based on not matching the
accept list.
2) Add some code in wget that would do this. Sounds like it would be
an extra "HEAD" request per link that gets rejected based on not
matching the accept list.
3) Any other suggestions?

For 2), if I understood the code base and coded it up, is this
something that could be useful? Any tips or ideas on how to do it (or
not do it)?

Thanks, Laurent

reply via email to

[Prev in Thread] Current Thread [Next in Thread]