bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] Unexpected result with -H and -D


From: Friso van Vollenhoven
Subject: [Bug-wget] Unexpected result with -H and -D
Date: Wed, 17 Jan 2018 13:53:40 +0100

Hello all,

I am trying to do a recursive download of a webpage and span multiple hosts
within the same domain, but not cross to other domains. The issue is that
the crawl does extend to other domains. My full command is this:

wget \
--recursive \
--no-clobber \
--page-requisites \
--adjust-extension \
--span-hosts \
--domains=scapino.nl \
--no-parent \
--tries=2 \
--wait=1 \
--random-wait \
--waitretry=2 \
--header='User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36' \
https://www.scapino.nl/winkels/scapino-utrecht-510061

>From this combination of --span-hosts and --domains, I would expect to
download assets from cdn.scapino.nl and www.scapino.nl, but not other
domains. For some reason that I don't understand, wget also starts to do
what looks like a full crawl of the domain werkenbijscapino.nl, which is
referenced from the original page.

Any thoughts or direction would be much appreciated.

I am using wget 1.18 on Debian.


Best regards,
Friso


reply via email to

[Prev in Thread] Current Thread [Next in Thread]