bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] wget -r -Dwww.cnn.com http://www.cnn.com downloads stuff


From: Micah Cowan
Subject: Re: [Bug-wget] wget -r -Dwww.cnn.com http://www.cnn.com downloads stuff outside cnn.com, bug?
Date: Fri, 25 Sep 2009 10:09:42 -0700
User-agent: Thunderbird 2.0.0.23 (X11/20090817)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Matthew Young wrote:
> Hello Micah & friends,
> 
> Iam trying to download content only for the specified domain in -D. However
> running a test with:
> 
> wget -r -Dwww.cnn.com http://www.cnn.com
> 
> 
> I noticed that it also creates directories with other subdomains and even
> domains that cnn.com has links:
> 
> money.cnn.com   sportsillustrated.cnn.com  www.ew.com www.time.com
> transcripts.cnn.com        www.turnerstoreonline.com

Assuming you don't have other things in your ~/.wgetrc that allow these
hosts (note that Wget won't even follow links to other hosts if you
specify -D, you have to also specify -H), my guess would be that these
were the results of redirects. That is, they correspond to a location on
www.cnn.com that redirected to a different host.

You can check whether this is the case by examining the log. The --debug
flag is particularly helpful for producing useful logs, but in this case
it shouldn't be necessary, so long as you have the normal "verbose" output.

> What is the way to achieve what I want or would this be a bug?

If my hunch is correct, then I'm afraid there's no way to avoid them.
Wget does not currently provide any facility for avoiding redirects to
other sites.

> If its a bug.. is thereanway to tell wget to download everything to
> www.cnn.com directory (even if it has to download subdomain stuff)

I'd go for "-P www.cnn.com -nH". This means, "Put all downloaded content
in www.cnn.com/, and don't generate an extra directory for hostnames.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
Maintainer of GNU Wget and GNU Teseq
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkq8+VUACgkQ7M8hyUobTrFAggCdE7vpGqQecKaxczfROmhDfdIt
xYwAoIrXOWNfjFkSdJrzNH53pmRvwGzc
=KrqD
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]