bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] robots.txt not working


From: Micah Cowan
Subject: Re: [Bug-wget] robots.txt not working
Date: Fri, 16 Mar 2012 23:38:37 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120302 Thunderbird/11.0

I think you're misunderstanding what was supposed to happen.

The robots.txt file is only followed for links that wget is
automatically following. This means (a) wget has to be in
recursive-descent mode (-r or -m), and (b) it only applies to links that
weren't explicitly requested by the user. In other words, it applies
only to links that wget is actually robotting on.

Hope that helps.

-mjc

On 03/16/2012 01:04 PM, phil curb wrote:
> I just tried creating a web server locally.
> |I tried creating a web server locally  putting robots.txt in there and using 
> wget  and it didn't work
> 
> 
> 
> http://pastebin.com/raw.php?i=kt1mV2af 
> 
> 
> C:\r>wget 127.0.0.1:56
> ....
> 2012-03-16 19:45:32 (20.0 KB/s) - `index.html' saved [3/3] C:\r>wget 
> 127.0.0.1:56/robots.txt
> ....
> 2012-03-16 19:45:43 (175 KB/s) - `robots.txt' saved [26/26] C:\r>type 
> robots.txt
> User-agent: *
> Disallow: /
> C:\r>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]