[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Enqueue logic problems
From: |
Tim Ruehsen |
Subject: |
Re: [Bug-wget] Enqueue logic problems |
Date: |
Thu, 2 May 2013 17:30:23 +0200 |
User-agent: |
KMail/1.13.7 (Linux/3.2.0-4-amd64; KDE/4.8.4; x86_64; ; ) |
Darshit, I guess you are talking about redirection.
That is 'wget -r gnu.org' is being redirected to www.gnu.org (via Location
header). Wget now follows the redirection, but only downloads index.html since
all included URLs in index.html refer to www.gnu.org. But we requested stuff
from gnu.org.
That's why only one file (index.html) is downloaded.
But that is not what the user expects...
The user could work around it using the -D and/or -H option, but then he has
to know about the redirection before he starts wget. Not everyone has the
understanding to find that out.
Should wget behaviour change (default or using a new option) or should we
leave it and print out verbose message that makes it clear to the user.
Regards, Tim
Am Thursday 02 May 2013 schrieb Micah Cowan:
> I believe you want -H -D gnu.org. That's what it's for. Wget doesn't
> know which hostnames under a domain should be allowed and which should
> not be (do you want images.gnu.org? git.gnu.org? lists.gnu.org?), so
> turns 'em all off unless you ask for them explicitly.
>
> HTH,
> -mjc
>
> On Thu, May 2, 2013 at 4:52 AM, Darshit Shah <address@hidden> wrote:
> > I should have been more clear. --span-hosts will enqueue the other files,
> > but it will also enqueue files from other hosts. I wish to recursively
> > download a website but not other sites that it links to.
> >
> > Of course I could add --accept-regex / --reject-regex options to prevent
> > wget from wandering onto other hosts. But shouldn't the default
> > --recursive option simply handle cases where a www is either added or
> > removed? Or is there any scenario that I am missing which would cause
> > undesirable effects here?
> >
> > On Thu, May 2, 2013 at 5:22 PM, Giuseppe Scrivano <address@hidden>
wrote:
> >> Darshit Shah <address@hidden> writes:
> >> > When using the --recursive command with wget, there seems to be a
> >> > small issue with the logic that decides whether to enqueue a file to
> >> > the downloads list or not.
> >> >
> >> > By default wget downloads files only from the same host. However, this
> >> > causes a problem when the target hostname changes thus:
> >> > parent: gnu.org
> >> > target: www.gnu.org
> >> >
> >> > This issue causes wget to stop after just one download on a lot of
> >> > sites. I'm not sure if this exists in the older or release since I
> >> > only have the development version installed.
> >>
> >> does --span-hosts fix this scenario for you?
> >>
> >> Cheers,
> >> Giuseppe
> >
> > --
> > Thanking You,
> > Darshit Shah
> > Research Lead, Code Innovation
> > Kill Code Phobia.
> > B.E.(Hons.) Mechanical Engineering, '14. BITS-Pilani
Mit freundlichem Gruß
Tim Rühsen
- [Bug-wget] Enqueue logic problems, Darshit Shah, 2013/05/02
- Re: [Bug-wget] Enqueue logic problems, Giuseppe Scrivano, 2013/05/02
- Re: [Bug-wget] Enqueue logic problems, Darshit Shah, 2013/05/02
- Re: [Bug-wget] Enqueue logic problems, Micah Cowan, 2013/05/02
- Re: [Bug-wget] Enqueue logic problems,
Tim Ruehsen <=
- Re: [Bug-wget] Enqueue logic problems, Darshit Shah, 2013/05/02
- Re: [Bug-wget] Enqueue logic problems, Micah Cowan, 2013/05/02
- Re: [Bug-wget] Enqueue logic problems, Tim Rühsen, 2013/05/02
- Re: [Bug-wget] Enqueue logic problems, Micah Cowan, 2013/05/02
- Re: [Bug-wget] Enqueue logic problems, Tim Ruehsen, 2013/05/02