bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] Re: one question of wget


From: Micah Cowan
Subject: [Bug-wget] Re: one question of wget
Date: Wed, 07 Jan 2009 10:25:30 -0800
User-agent: Thunderbird 2.0.0.18 (X11/20081125)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

wang qiang wrote:
> Hello there,

Hi. In the future, please use the list (address@hidden) for support
requests. I can't promise to answer personal-mail support requests.

> When I tested the WGet, I met a question.
> 
> I used the command ./src/wget -r -l6 http://news.yahoo.com
> to get the pages, it worked well.
> 
> But I use the command
> 
> ./src/wget -r -l6 http://csce.uark.edu
> 
> it just could get the first page i.e. index.html, and then halted.
> 
> Could you please tell me how to solve this problem? I found that there
> was a "robot.txt" in the folder when retrieving from news.yahoo.com,
> but no "robot.txt" when retrieving from csce.uark.edu. Thanks,

csce.uark.edu includes many links to hosts other than "csce.uark.edu".
www.csce.uark.edu, for example, and some others for hosting images I
think. Wget by default will refuse to follow links to other hosts; you
need to add -H -D csce.uark.edu to get the other links (changing the
requested URI to www.csce.uark.edu doesn't help much, because there are
many links to csce.uark.edu (without www) as well).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAklk85oACgkQ7M8hyUobTrHdAgCfTYu2QwDJiXW3n1EnhvWq9kar
GBIAnjwwTUnUFO7D75bzYhKk5P2FF7hw
=4Xjm
-----END PGP SIGNATURE-----





reply via email to

[Prev in Thread] Current Thread [Next in Thread]