[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Behaviour of spanning to accepted domains
From: |
Tim Rühsen |
Subject: |
Re: [Bug-wget] Behaviour of spanning to accepted domains |
Date: |
Sun, 07 Jun 2015 19:08:19 +0200 |
User-agent: |
KMail/4.14.2 (Linux/4.0.0-1-amd64; KDE/4.14.2; x86_64; ; ) |
Am Sonntag, 7. Juni 2015, 08:19:28 schrieb Tony Lewis:
> On Friday, June 05, 2015 1:24 PM, Tim Rühsen wrote:
> > > First, I have not dug into the source code to see how -H is implemented.
> > > However, it makes sense to me that one ought to be able to specify
> > > both -H and -D together.
> >
> > -H (=all domains)
> > to exclude some sites use --exclude-domains domain-list
>
> wget --help says about -H: go to foreign hosts when recursive.
>
> It doesn't say that when using -H one *must* take every foreign host that
> exists on the Internet and I'm arguing that such an interpretation does not
> make sense.
That is what -H is for :-)
Well, not *every* foreign host, but *every* foreign host that appears in
downloaded, parsable files (HTML and CSS files).
wget --help just gives a short help, not a full description. See 'man wget'
for the extended description. If there is something unclear, we should fix it.
Using -H always has the chance to 'download the whole internet'. That's
normally not what you want and thus -H is not enabled by default.
>
> One ought to be able to request that wget go to foreign hosts without that
> implying that wget mirror the entire Internet. One obvious way to limit
> which foreign hosts are mirrored is to use -H in combination with -D.
>
> > > Consider this scenario: I want to mirror a site including the images
> > > that are stored in a sub-domain, but I don't want to mirror every
> > > external site referenced by the site. So I would try this:
> > >
> > > wget --mirror http://www.somesite.com -H -D www.somesite.com
> > > images.somesite.com
> >
> > You can also play with:
> > -A acclist --accept acclist
> > -R rejlist --reject rejlist
>
> I can play with lots of wget options, but in the scenario described I want
> *all* files from two hosts, but not every other foreign host that might be
> referenced by one of those hosts.
>
> What command line would you use for the scenario described?
Let's say you want all from the two hosts example1.com and example2.com:
wget --mirror example1.com example2.com
Regards, Tim
signature.asc
Description: This is a digitally signed message part.