bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] robots.txt seemingly ignored


From: Daniel Feenberg
Subject: [Bug-wget] robots.txt seemingly ignored
Date: Mon, 14 May 2018 09:39:39 -0400 (EDT)
User-agent: Alpine 2.21 (LRH 202 2017-01-01)


I have the following wget command line:

   wget -r  http://wwwdev.nber.org/

http://wwwdev.nber.org/robots.txt  is:

  User-agent: *
  Disallow: /

  User-Agent: W3C-checklink
  Disallow:


However wget fetches thousands of pages from wwwdev.nber.org. I would have thought nothing would be found. (This is a demonstration, obviously in real life I'd have a more detailed robots.txt to control the process).

Obviously too, I don't understand something about wget or robots.txt. Can anyone help me out?

This is GNU Wget 1.12 built on linux-gnu.

Thank you
Daniel Feenberg



reply via email to

[Prev in Thread] Current Thread [Next in Thread]