bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] Filtering of page requisites


From: Dale R. Worley
Subject: [Bug-wget] Filtering of page requisites
Date: Wed, 12 Oct 2016 10:49:44 -0400

So I've run into another version of the problem:  I'm using
--page-requisites, and they're getting filtered in much the same way as
redirections.  However, the new fixes don't change that behavior.

The example case is that
    $ wget --mirror --convert-links --page-requisites --limit-rate=20k \
        --include-directories=/assignments \
        http://www.iana.org/assignments/index.html
does not fetch the CSS specified by
http://www.iana.org/assignments/index.html in
        <link rel="stylesheet" media="screen" href="../_css/2015.1/screen.css"/>
which is http://www.iana.org/_css/2015.1/screen.css.

It looks like requisite URLs are flagged with link_inline_p of struct
urlpos true.  If that flag is set and opt.page_requisites is set, then
test 4 of download_child is suppressed (which is the --no-parent test).

This change seems to add the same logic as is applied to redirections:

diff --git a/src/recur.c b/src/recur.c
index 1469e31..b1f9109 100644
--- a/src/recur.c
+++ b/src/recur.c
@@ -462,6 +462,12 @@ retrieve_tree (struct url *start_url_parsed, struct iri 
*pi)
 
                   r = download_child (child, url_parsed, depth,
                                       start_url_parsed, blacklist, i);
+                 if (child->link_inline_p &&
+                     (reason == WG_RR_LIST || reason == WG_RR_REGEX))
+                   {
+                     DEBUGP (("Ignoring decision for page requisite, decided 
to load it.\n"));
+                     reason = WG_RR_SUCCESS;
+                   }
                   if (r == WG_RR_SUCCESS)
                     {
                       ci = iri_new ();

and it has the expected effect, the requisites for index.html are
downloaded.

I've attached a patch for this that includes an update to the manual page.
Although the update to the manual page doesn't mention the suppression
of the --no-parent test.

Dale

Attachment: requisite.diff
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]