bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] a note about "--reject" command line switch does not affect h


From: Dmitry Bolshakov
Subject: [Bug-wget] a note about "--reject" command line switch does not affect html files
Date: Fri, 03 Dec 2010 15:32:12 +0300

Hi all

it's just one example about this

when I wanted to mirror one wordpress-based site (http://media-mera.ru) I have 
noticed that the process takes too much time
I have found that the reason is "reply to" links which are provided for every 
user comment:
http://media-mera.ru/articles/socially_useful?replytocom=6192#respond
http://media-mera.ru/articles/socially_useful?replytocom=6194#respond
http://media-mera.ru/articles/socially_useful?replytocom=6358#respond
etc
actually these addreses is just the same page 
http://media-mera.ru/articles/socially_useful so I have added -R to the
command line
-R '*replytocom=*'
but nothing changed and after some googling I have found the note about wget 
behaviour:
-------
http://www.gnu.org/software/wget/manual/html_node/Types-of-Files.html#Types-of-Files
..
Note that these two options do not affect the downloading of html files (as 
determined by a ‘.htm’ or ‘.html’ filename
prefix). This behavior may not be desirable for all users, and may be changed 
for future versions of Wget.
..
-------
yes, I absolutely agree, it should be changed, judged by wget output the total 
downloaded traffic exceeds resulted
saved mirror in 10 times!

PS
wget is running on this site 30 minutes, httrack - only 1,5

PPS
while was writing, I have found even special wordpress plugin which is intended 
to reduce traffic of "replytocom" links - 
http://wordpress.org/extend/plugins/replytocom-redirector/

-- 
with best regards
Dmitry Bolshakov



reply via email to

[Prev in Thread] Current Thread [Next in Thread]