bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Problem mirroring site with two domain names


From: Ángel González
Subject: Re: [Bug-wget] Problem mirroring site with two domain names
Date: Fri, 20 May 2011 17:27:16 +0200
User-agent: Thunderbird

Chris Dorsey wrote:
> I am trying to mirror a web site that has two domain names, let's call them 
> www.abc.com and www.abcdef.com. Both URLs get to the same site. If I browse 
> the site in IE I can see some hyperlinks point to http://www.abc.com/... and 
> some point to http://www.abcdef.com.
>
> I am using this command line:
>
> wget.exe -r -l inf -w 10 --random-wait -E -k -K -N -H -D abcdef.com,abc.com 
> -o wgetlog.txt http://abc.com/
>
> What I get is two directories named www.abc.com/ and www.abcdef.com/ with 
> almost identical contents. The content has effectively been downloaded twice.
>
> What I want to do is make a single mirror copy of www.abc.com, with all the 
> references to www.abcdef.com treated as references to www.abc.com when the 
> links are converted in the local copy (-k).
>
> Any ideas?
>
>
> Chris Dorsey

Precreate the folders with abcdef.com being a symlink to abc.com
The links are not converted, but you will only download things almost
once. When wget goes to check the second domain it will find out that
it already has every file dowloaded from the other domain.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]