bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Filtering for requisites and redirections


From: Dale R. Worley
Subject: Re: [Bug-wget] Filtering for requisites and redirections
Date: Fri, 14 Oct 2016 10:06:11 -0400

Tim Ruehsen <address@hidden> writes:
>> Perhaps we do not want to have --no-parent suppressed by
>> --page-requisites.  It seems that --no-parent is intended as a security
>> measure, and the existing code (as well as this proposal) violate its
>> fundamental premise.
>
> --no-parent seems to be intended as a bandwidth limiter together with -r. 
> When 
> talking about security, what realistic scenario do you have in mind ?
>
> Anyways, we definitely don't want to change the default behavior.

What I see in the manual page (admittedly, an old one, 1.16.1) is:

       -np
       --no-parent
           Do not ever ascend to the parent directory when retrieving
           recursively.  This is a useful option, since it guarantees that
           only the files below a certain hierarchy will be downloaded.

In the Info page, I see more:

In 2.11, "Recursive Accept/Reject Options":
    '-np'
    '--no-parent'
         Do not ever ascend to the parent directory when retrieving
         recursively.  This is a useful option, since it guarantees that
         only the files _below_ a certain hierarchy will be downloaded.
         *Note Directory-Based Limits::, for more details.
In 4.3, "Directory-Based Limits":
    '-np'
    '--no-parent'
    'no_parent = on'
         The simplest, and often very useful way of limiting directories is
         disallowing retrieval of the links that refer to the hierarchy
         "above" than the beginning directory, i.e.  disallowing ascent to
         the parent directory/directories.

         The '--no-parent' option (short '-np') is useful in this case.
         Using it guarantees that you will never leave the existing
         hierarchy.  Supposing you issue Wget with:

              wget -r --no-parent http://somehost/~luzer/my-archive/

         You may rest assured that none of the references to
         '/~his-girls-homepage/' or '/~luzer/all-my-mpegs/' will be
         followed.  Only the archive you are interested in will be
         downloaded.  Essentially, '--no-parent' is similar to
         '-I/~luzer/my-archive', only it handles redirections in a more
         intelligent fashion.

         *Note* that, for HTTP (and HTTPS), the trailing slash is very
         important to '--no-parent'.  HTTP has no concept of a
         "directory"--Wget relies on you to indicate what's a directory and
         what isn't.  In 'http://foo/bar/', Wget will consider 'bar' to be a
         directory, while in 'http://foo/bar' (no trailing slash), 'bar'
         will be considered a filename (so '--no-parent' would be
         meaningless, as its parent is '/').

The text "You may rest assured that none of the references to
'/~his-girls-homepage/' or '/~luzer/all-my-mpegs/' will be
followed." suggests that --no-parent can be relied upon as a type of
security feature.

I am not personally deeply concerned about this.  But I want to see the
issue discussed on the mailing list, as the current default behavior
differs from the documentation in a way that might be important.

Dale



reply via email to

[Prev in Thread] Current Thread [Next in Thread]