[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] [bug #56660] wget -r or mirror with robots-off should still d

From: anonymous
Subject: [Bug-wget] [bug #56660] wget -r or mirror with robots-off should still download robots.txt file
Date: Tue, 23 Jul 2019 11:45:34 -0400 (EDT)
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Firefox/68.0


                 Summary: wget -r or mirror with robots-off should still
download robots.txt file
                 Project: GNU Wget
            Submitted by: None
            Submitted on: Tue 23 Jul 2019 03:45:32 PM UTC
                Category: None
                Severity: 3 - Normal
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
         Originator Name: 
        Originator Email: 
             Open/Closed: Open
         Discussion Lock: Any
                 Release: 1.20
        Operating System: None
         Reproducibility: None
           Fixed Release: None
         Planned Release: None
              Regression: None
           Work Required: None
          Patch Included: None



GNU Wget 1.20.3 built on darwin18.6.0.

with robots=off, wget does not download the robots.txt file 

wget -r -e robots=off https://www.robotstxt.org/
robots.txt is not downloaded even though it is present 

downloading the root of a site with recursion or --mirror should still save
the robots.txt file, even if it is being ignored. 

The robots.txt file still contains useful information for site mirroring and
archival purposes, even if it isn't being respected .


Reply to this item at:


  Message sent via Savannah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]