Re: [Bug-wget] Recursive download and `trivial' redirects

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Recursive download and `trivial' redirects

From:	Ángel González
Subject:	Re: [Bug-wget] Recursive download and `trivial' redirects
Date:	Mon, 25 Nov 2013 21:02:26 +0100
User-agent:	Thunderbird

On 25/11/13 02:58, Maxim Kuznetsov wrote:

Hello there,

Retrieving a directory (or some `clean' URL) without a slash at the
end of a URL -- e.g. example.com/foo -- web servers often add an
end-slash by a redirect example.com/foo ->  example.com/foo/.  I'll
hereafter call such redirects `trivial'.

The problem is that some websites (e.g. ocw.mit.edu) use links without
end-slash.  This means that when Wget (with -r) retrieves
example.com/foo, it'll save the content to the file `foo' regardless
of the redirect.  Then when Wget reads `foo' and sees a link to
example.com/foo/file.bar, it'll delete a regular file `foo' and create
a directory with the same name (by the function mkalldirs(), see
url.c:1220).  Therefore we lose the entire page.

Example of reproducer (GNU Wget 1.14.97-1221):
$ wget -d -r --no-parent
http://ocw.mit.edu/courses/mathematics/18-100b-analysis-i-fall-2010/
2>&1 | grep "directory danger"
Removing ocw.mit.edu/courses/<skipped>/assignments because of directory danger!
Removing ocw.mit.edu/courses/<skipped>/readings-notes because of
directory danger!
Removing ocw.mit.edu/courses/<skipped>/study-materials because of
directory danger!

--trust-server-names solves this problem, but it seems to be not
obvious for a user to use it every time together with -r, to say
nothing of security reasons.

Does it sound reasonable to handle such `trivial' redirects (that
simply add an end-slash) as a special case regardless of
`trust-server-names'?

Thanks

Probably instead of being removed the file should have been renamed asindex.html inside the newly created folder.

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-wget] Recursive download and `trivial' redirects, Maxim Kuznetsov, 2013/11/25
- Re: [Bug-wget] Recursive download and `trivial' redirects, Ángel González <=

Prev by Date: [Bug-wget] Recursive download and `trivial' redirects
Next by Date: Re: [Bug-wget] [PATCH] support for gzipped transfer in wget-1.14
Previous by thread: [Bug-wget] Recursive download and `trivial' redirects
Next by thread: Re: [Bug-wget] [PATCH] support for gzipped transfer in wget-1.14
Index(es):
- Date
- Thread