bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] Recursion problem with wget


From: Dale R. Worley
Subject: [Bug-wget] Recursion problem with wget
Date: Thu, 11 Aug 2016 20:08:25 -0400

In regard to my problem, http://savannah.gnu.org/bugs/?48708

The behavior stems from (what seems to me to be) an oddity in how wget
handles recursion:  If a page is fetched from a URL, but that fetch
involves no HTTP redirection, then the embedded links are tested against
the recursion criteria to see if they should be fetched.  But if the
page fetch involves redirection, the page is fetched, but if the
ultimate URL of the redirection does not itself pass the recursion
criteria, the links in the page are not considered, even if they pass
the recursion criteria.

My preferred behavior is that all pages that are retrieved are scanned
for embedded links in any case.

The behavior can be "corrected" straightforwardly:

    diff --git a/src/recur.c b/src/recur.c
    index 2b17e72..91cc585 100644
    --- a/src/recur.c
    +++ b/src/recur.c
    @@ -360,6 +361,7 @@ retrieve_tree (struct url *start_url_parsed, struct iri 
*pi)
                         {
                           reject_reason r = descend_redirect (redirected, 
url_parsed,
                                             depth, start_url_parsed, 
blacklist, i);
    +                      r = WG_RR_SUCCESS;
                           if (r == WG_RR_SUCCESS)
                             {
                               /* Make sure that the old pre-redirect form gets

This, of course, isn't the proper and final fix.

It seems to me that making this change in the code would change its
behavior sufficiently that we would have to worry about backward
compatibility.  Ideally, I'd like the new default behavior to be my
preferred behavior, and use an option to restore the previous behavior.
But it might be necessary to use an option to enable my preferred
behavior to prevent disruption.

Interestingly, "make check" *succeeds* with the above code change, so
the test suite is *not* testing for this behavior.

Comments?

Dale



reply via email to

[Prev in Thread] Current Thread [Next in Thread]