bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] What ought to be a simple use of wget


From: Ander Juaristi
Subject: Re: [Bug-wget] What ought to be a simple use of wget
Date: Tue, 2 Aug 2016 19:43:19 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.5.0

Hi Dale,

I'm seeing it always redirects to www.iana.org/protocols

Would -A protocols work for you?

e.g
wget ----mirror --convert-links --no-parent --page-requisites -A
protocols http://www.iana.org/protocols

On 02/08/16 18:38, Dale R. Worley wrote:
> I want to make a local copy of the "IANA protocol assignments" web
> pages.  It seems to me that this ought to be a simple use of wget in
> recursive mode, and indeed, it seems like someone else must have run
> into this need before.  But I can't get a combination of wget options
> that has the behavior I want.
> 
> The goal is to make a local file tree that mirrors these URLs:
> 
>     http://www.iana.org/assignments/index.html
>     (That page should be in a file named 'index.html'.)
> 
>     every HTML page under http://www.iana.org/assignments/ that can be
>     reached from index.html
> 
>     page requisites for those pages, even if they aren't under
>     http://www.iana.org/assignments/
> 
> The interference comes from all the stuff under http://www.iana.org that
> is not under http://www.iana.org/assignments, but which is pointed to by
> the pages listed above.
> 
> To resolve the simple problem, it appears that --page-requisites does
> fetch the page requisites, even if they aren't under
> http://www.iana.org/assignments/.  So that part of the solution works
> fine.
> 
> But I can't figure out the right combination of options to fetch the
> HTML files that I want:
> 
> 
> wget --mirror --convert-links --no-parent --page-requisites 
> http://www.iana.org/assignments/index.html
> Follows links outside of /assignments/.
> 
> wget --mirror --convert-links --exclude-directories=/ --page-requisites 
> http://www.iana.org/assignments/index.html
> This doesn't recurse beyond index.html.
> 
> wget --mirror --convert-links --no-parent --page-requisites 
> http://www.iana.org/assignments
> Follows links outside of /assignments/.
> 
> wget --mirror --convert-links --exclude-directories=/ --page-requisites 
> http://www.iana.org/assignments
> This doesn't recurse beyond index.html.
> 
> wget --mirror --convert-links --no-parent --page-requisites 
> http://www.iana.org/assignments/
> This doesn't recurse beyond index.html.
> 
> wget --mirror --convert-links --exclude-directories=/ --page-requisites 
> http://www.iana.org/assignments/
> This doesn't recurse beyond index.html.
> 
> 
> I'm hoping that this is a known problem and someone can tell me the
> answer without having to think about it.
> 
> I also think the documentation could be made clearer in some places, but
> that can wait.
> 
> Dale
> 

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]