bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

limiting recurion for fetching but not for --page-requisites using --spa


From: Chris Lawson
Subject: limiting recurion for fetching but not for --page-requisites using --span-hosts
Date: Mon, 14 Nov 2022 00:42:09 -0600

Hello everyone,

I've been experimenting with combinations of --recursive, --span-hosts,
--page-requisites, --domains='X,Y,Z' for downloading pages from blogs and
forums, and can't figure out how to do exactly what I want.

I want to follow pages recursively, but only within certain domains, so I
set --recursive, --span-hosts, and --domains='X,Y,Z'. For each page fetched
I also want to grab all the page requisites, especially images and CSS
files, so I set --page-requisites, but it looks like --page-requisites is
subject to --span-hosts and the --domain= flag, so it won't grab images
outside of the domains I specify.

What I'd like is for --page-requisites to visit any domains needed without
restriction, but of course if I just set --span-hosts and don't set
--domains=, then I get a runaway recursive download.

(Currently I'm solving this by getting the pages once, grepping for img
tags, then adding those domains to my --domains flag. But this backfires on
me if someone links to the image hosting site in the page I'm fetching, and
I get runaway recursion.)

Is there any way to do what I want?

Thanks in advance.

Chris


reply via email to

[Prev in Thread] Current Thread [Next in Thread]