[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] can't reject robots.txt in recursive mode
From: |
Ilya Basin |
Subject: |
[Bug-wget] can't reject robots.txt in recursive mode |
Date: |
Wed, 6 Aug 2014 12:52:05 +0400 |
Here's my script to download IBM javadocs:
(
rm -rf wget-test
mkdir wget-test
cd wget-test
starturl="http://www-01.ibm.com/support/knowledgecenter/api/content/SSZLC2_7.0.0/com.ibm.commerce.api.doc/allclasses-noframe.html"
wget -d -r -R robots.txt --page-requisites -nH --cut-dirs=5 --no-parent
"$starturl" 2>&1 | tee wget.log
)
regardless of '-R' option, wget downloads robots.txt and refuses to
follow links starting with "/support/knowledgecenter/api/".
Workaround:
touch robots.txt
chmod 400 robots.txt
GNU Wget 1.15 built on linux-gnu
- [Bug-wget] can't reject robots.txt in recursive mode,
Ilya Basin <=