[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH 1/2] Print message for no-follow attribute only if norobots respe
From: |
Nils |
Subject: |
[PATCH 1/2] Print message for no-follow attribute only if norobots respected |
Date: |
Tue, 13 Apr 2021 18:10:25 +0100 |
Commit e39be3283836b8cb7b9ee320456eefb2a2fda173 added a message that
said links will not be followed whenever the nofollow attribute is found
in a page. It didn't take into account that with -e robots=off (and
equivalents) links will still be followed.
This bug has been noticed multiple times:
*
https://www.reddit.com/r/DataHoarder/comments/mprq89/wget_respects_nofollow_attribute_despite_e/
*
https://gist.github.com/simonw/27e810771137408fd7834ad153750c41#gistcomment-3648191
* https://superuser.com/questions/1494761/wget-wont-ignore-no-follow-attributes
This commits makes it so that this message is only printed when a
nofollow link is found and the norobots convention is respected.
---
src/html-url.c | 3 ---
src/recur.c | 1 +
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/src/html-url.c b/src/html-url.c
index 2d79ca49..acf7f515 100644
--- a/src/html-url.c
+++ b/src/html-url.c
@@ -837,9 +837,6 @@ get_urls_html_fm (const char *file, const struct
file_memory *fm,
#endif
xfree (meta_charset);
- if (ctx.nofollow) {
- logprintf(LOG_VERBOSE, _("no-follow attribute found in %s. Will not
follow any links on this page\n"), file);
- }
DEBUGP (("no-follow in %s: %d\n", file, ctx.nofollow));
if (meta_disallow_follow)
diff --git a/src/recur.c b/src/recur.c
index 7bc4ec42..3c4136d1 100644
--- a/src/recur.c
+++ b/src/recur.c
@@ -427,6 +427,7 @@ retrieve_tree (struct url *start_url_parsed, struct iri *pi)
if (opt.use_robots && meta_disallow_follow)
{
+ logprintf(LOG_VERBOSE, _("no-follow attribute found in %s. Will
not follow any links on this page\n"), file);
free_urlpos (children);
children = NULL;
}
--
2.29.3
- [PATCH 1/2] Print message for no-follow attribute only if norobots respected,
Nils <=