bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 1/2] Print message for no-follow attribute only if norobots respe


From: Nils
Subject: [PATCH 1/2] Print message for no-follow attribute only if norobots respected
Date: Tue, 13 Apr 2021 18:10:25 +0100

Commit e39be3283836b8cb7b9ee320456eefb2a2fda173 added a message that
said links will not be followed whenever the nofollow attribute is found
in a page. It didn't take into account that with -e robots=off (and
equivalents) links will still be followed.

This bug has been noticed multiple times:
* 
https://www.reddit.com/r/DataHoarder/comments/mprq89/wget_respects_nofollow_attribute_despite_e/
* 
https://gist.github.com/simonw/27e810771137408fd7834ad153750c41#gistcomment-3648191
* https://superuser.com/questions/1494761/wget-wont-ignore-no-follow-attributes

This commits makes it so that this message is only printed when a
nofollow link is found and the norobots convention is respected.
---
 src/html-url.c | 3 ---
 src/recur.c    | 1 +
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/html-url.c b/src/html-url.c
index 2d79ca49..acf7f515 100644
--- a/src/html-url.c
+++ b/src/html-url.c
@@ -837,9 +837,6 @@ get_urls_html_fm (const char *file, const struct 
file_memory *fm,
 #endif
   xfree (meta_charset);
 
-  if (ctx.nofollow) {
-      logprintf(LOG_VERBOSE, _("no-follow attribute found in %s. Will not 
follow any links on this page\n"), file);
-  }
   DEBUGP (("no-follow in %s: %d\n", file, ctx.nofollow));
 
   if (meta_disallow_follow)
diff --git a/src/recur.c b/src/recur.c
index 7bc4ec42..3c4136d1 100644
--- a/src/recur.c
+++ b/src/recur.c
@@ -427,6 +427,7 @@ retrieve_tree (struct url *start_url_parsed, struct iri *pi)
 
           if (opt.use_robots && meta_disallow_follow)
             {
+              logprintf(LOG_VERBOSE, _("no-follow attribute found in %s. Will 
not follow any links on this page\n"), file);
               free_urlpos (children);
               children = NULL;
             }
-- 
2.29.3




reply via email to

[Prev in Thread] Current Thread [Next in Thread]