[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] can't reject robots.txt in recursive mode

From: Giuseppe Scrivano
Subject: Re: [Bug-wget] can't reject robots.txt in recursive mode
Date: Wed, 06 Aug 2014 15:38:43 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Ilya Basin <address@hidden> writes:

> Here's my script to download IBM javadocs:
> (
>     rm -rf wget-test
>     mkdir wget-test
>     cd wget-test
> starturl="http://www-01.ibm.com/support/knowledgecenter/api/content/SSZLC2_7.0.0/com.ibm.commerce.api.doc/allclasses-noframe.html";
>     wget -d -r -R robots.txt --page-requisites -nH --cut-dirs=5 --no-parent 
> "$starturl" 2>&1 | tee wget.log
> )
> regardless of '-R' option, wget downloads robots.txt and refuses to
> follow links starting with "/support/knowledgecenter/api/".

No need to use any workaround, you should be able to achieve the same
behavior with "-e robots=off" as documented.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]