[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Filtering for requisites and redirections
From: |
Dale R. Worley |
Subject: |
Re: [Bug-wget] Filtering for requisites and redirections |
Date: |
Fri, 14 Oct 2016 10:06:11 -0400 |
Tim Ruehsen <address@hidden> writes:
>> Perhaps we do not want to have --no-parent suppressed by
>> --page-requisites. It seems that --no-parent is intended as a security
>> measure, and the existing code (as well as this proposal) violate its
>> fundamental premise.
>
> --no-parent seems to be intended as a bandwidth limiter together with -r.
> When
> talking about security, what realistic scenario do you have in mind ?
>
> Anyways, we definitely don't want to change the default behavior.
What I see in the manual page (admittedly, an old one, 1.16.1) is:
-np
--no-parent
Do not ever ascend to the parent directory when retrieving
recursively. This is a useful option, since it guarantees that
only the files below a certain hierarchy will be downloaded.
In the Info page, I see more:
In 2.11, "Recursive Accept/Reject Options":
'-np'
'--no-parent'
Do not ever ascend to the parent directory when retrieving
recursively. This is a useful option, since it guarantees that
only the files _below_ a certain hierarchy will be downloaded.
*Note Directory-Based Limits::, for more details.
In 4.3, "Directory-Based Limits":
'-np'
'--no-parent'
'no_parent = on'
The simplest, and often very useful way of limiting directories is
disallowing retrieval of the links that refer to the hierarchy
"above" than the beginning directory, i.e. disallowing ascent to
the parent directory/directories.
The '--no-parent' option (short '-np') is useful in this case.
Using it guarantees that you will never leave the existing
hierarchy. Supposing you issue Wget with:
wget -r --no-parent http://somehost/~luzer/my-archive/
You may rest assured that none of the references to
'/~his-girls-homepage/' or '/~luzer/all-my-mpegs/' will be
followed. Only the archive you are interested in will be
downloaded. Essentially, '--no-parent' is similar to
'-I/~luzer/my-archive', only it handles redirections in a more
intelligent fashion.
*Note* that, for HTTP (and HTTPS), the trailing slash is very
important to '--no-parent'. HTTP has no concept of a
"directory"--Wget relies on you to indicate what's a directory and
what isn't. In 'http://foo/bar/', Wget will consider 'bar' to be a
directory, while in 'http://foo/bar' (no trailing slash), 'bar'
will be considered a filename (so '--no-parent' would be
meaningless, as its parent is '/').
The text "You may rest assured that none of the references to
'/~his-girls-homepage/' or '/~luzer/all-my-mpegs/' will be
followed." suggests that --no-parent can be relied upon as a type of
security feature.
I am not personally deeply concerned about this. But I want to see the
issue discussed on the mailing list, as the current default behavior
differs from the documentation in a way that might be important.
Dale