[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Filtering for requisites and redirections
From: |
Tim Ruehsen |
Subject: |
Re: [Bug-wget] Filtering for requisites and redirections |
Date: |
Fri, 14 Oct 2016 13:10:46 +0200 |
User-agent: |
KMail/5.2.3 (Linux/4.7.0-1-amd64; KDE/5.26.0; x86_64; ; ) |
On Thursday, October 13, 2016 6:27:56 PM CEST Dale R. Worley wrote:
> If --page-requisites is specified along with --no-parent, then requisite
> files will be downloaded even if their URLs would normally be suppressed
> by --no-parent. This is implemented by a test in section 4 of
> download_child in recur.c, and a flag in struct urlpos, link_inline_p,
> which says that the *context* of that URL is as a page requisite.
>
> This suggests that the exceptional processing we want to implement for
> redirections might be more systematically implemented by using the above
> processing as a model, and not by testing the value returned by
> download_child. This involves adding a flag link_redirect_p to struct
> urlpos; this flag functions as an alternative to the additional argument
> to download_child that I previously suggested.
>
> In addition, this approach avoids the problem of ensuring that
> download_child returns the correct value if a URL fails more than one
> test, e.g., --accept-regex and robots, because any tests that are to be
> ignored in the context are not executed and do not affect the return
> value.
>
> It also suggests that we may want to define that --no-parent does not
> apply to redirections, in the same way that it does not apply to page
> requisites when --page-requisite is set.
>
> I've also updated the TEXI file to describe the functional changes, and
> also the previously-undocumented behavior of --page-requisites
> overriding --no-parent. The changes are in the attached diff.
>
> However, looking at the documentation for --no-parent:
>
> -np
> --no-parent
> Do not ever ascend to the parent directory when retrieving
> recursively. This is a useful option, since it guarantees that
> only the files below a certain hierarchy will be downloaded.
>
> Note that the effect of --no-parent is suppressed for fetching
> redirected URLs and for fetching page requisite URLs if
> --page-requisites is specified.
>
> Perhaps we do not want to have --no-parent suppressed by
> --page-requisites. It seems that --no-parent is intended as a security
> measure, and the existing code (as well as this proposal) violate its
> fundamental premise.
--no-parent seems to be intended as a bandwidth limiter together with -r. When
talking about security, what realistic scenario do you have in mind ?
Anyways, we definitely don't want to change the default behavior.
If someone *really* needs a different precedence and has good arguments and
finds someone to implement it (inclusive tests), we'll add such a feature.
Regarding redirections, we have --max-redirect and could use --max-redirect=0
to disallow redirections. *But* we have at least two different qualities of
redirections: 1. staying on the same host/domain, 2. host spanning.
If neither -H/--span-hosts is given nor -D/--domains matches, we should not
span hosts for redirections.
>
> Dale
signature.asc
Description: This is a digitally signed message part.