bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Behaviour of spanning to accepted domains


From: Tim Rühsen
Subject: Re: [Bug-wget] Behaviour of spanning to accepted domains
Date: Sun, 07 Jun 2015 19:08:19 +0200
User-agent: KMail/4.14.2 (Linux/4.0.0-1-amd64; KDE/4.14.2; x86_64; ; )

Am Sonntag, 7. Juni 2015, 08:19:28 schrieb Tony Lewis:
> On Friday, June 05, 2015 1:24 PM, Tim Rühsen wrote:
> > > First, I have not dug into the source code to see how -H is implemented.
> > > However, it makes sense to me that one ought to be able to specify
> > > both -H and -D together.
> > 
> > -H (=all domains)
> > to exclude some sites use --exclude-domains domain-list
> 
> wget --help says about -H: go to foreign hosts when recursive.
> 
> It doesn't say that when using -H one *must* take every foreign host that
> exists on the Internet and I'm arguing that such an interpretation does not
> make sense.

That is what -H is for :-)
Well, not *every* foreign host, but *every* foreign host that appears in 
downloaded, parsable files (HTML and CSS files).

wget --help just gives a short help, not a full description. See 'man wget' 
for the extended description. If there is something unclear, we should fix it.

Using -H always has the chance to 'download the whole internet'. That's 
normally not what you want and thus -H is not enabled by default.

> 
> One ought to be able to request that wget go to foreign hosts without that
> implying that wget mirror the entire Internet. One obvious way to limit
> which foreign hosts are mirrored is to use -H in combination with -D.
> 
> > > Consider this scenario: I want to mirror a site including the images
> > > that are stored in a sub-domain, but I don't want to mirror every
> > > external site referenced by the site. So I would try this:
> > > 
> > > wget --mirror http://www.somesite.com -H -D www.somesite.com
> > > images.somesite.com
> > 
> > You can also play with:
> >       -A acclist --accept acclist
> >       -R rejlist --reject rejlist
> 
> I can play with lots of wget options, but in the scenario described I want
> *all* files from two hosts, but not every other foreign host that might be
> referenced by one of those hosts.
> 
> What command line would you use for the scenario described?

Let's say you want all from the two hosts example1.com and example2.com:

wget --mirror example1.com example2.com

Regards, Tim

Attachment: signature.asc
Description: This is a digitally signed message part.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]