bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Bug-wget] mirroring one sourceforge package?


From: Tony Lewis
Subject: RE: [Bug-wget] mirroring one sourceforge package?
Date: Wed, 30 Mar 2011 15:06:20 -0700

It works as I would expect in 1.11.4, with the exception of downloading this
file:
sourceforge.net/projects/biblatex-biber/files/index.html

Tony
-----Original Message-----
From: address@hidden
[mailto:address@hidden On Behalf Of Micah Cowan
Sent: Wednesday, March 30, 2011 3:00 PM
To: Karl Berry
Cc: address@hidden
Subject: Re: [Bug-wget] mirroring one sourceforge package?

(03/30/2011 02:37 PM), Karl Berry wrote:
> The bug (?) -- running
>   wget -m -np -nv \
>
http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/
> ends up downloading many things above that directory, despite the -np.
> Doesn't that seem wrong?
> This is with wget 1.12 compiled from the original source.

Definitely a bug; reproduced with Ubuntu Lucid's wget 1.12.

Running with --debug, I see a lot of:

Deciding whether to enqueue "http://sourceforge.net/blog/";.
Going to "blog" would escape
"projects/biblatex-biber/files/biblatex-biber/current" with no_parent on.
Decided NOT to load it.

And then:

Deciding whether to enqueue "https://sourceforge.net/blog/";.
Allowing path blog/ because of rule `'.
Decided to load it.

That link was apparently found in https://sourceforge.net/account/login.php

So it looks like wget is correctly blocking the http URL, but
incorrectly permitting the https URL.

Adding -R login.php seems a decent workaround; I let it run awhile (not
forever), and it seemed okay, though it did get a single link (so far)
outside the expected hierarchy (once again, an https link; this time to
a wiki page; the page fortunately appears not to have incurred other
renegade links AFAICT).

-- 
HTH,
Micah J. Cowan
http://micah.cowan.name/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]