[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] a note about "--reject" command line switch does not affect h
From: |
Dmitry Bolshakov |
Subject: |
[Bug-wget] a note about "--reject" command line switch does not affect html files |
Date: |
Fri, 03 Dec 2010 15:32:12 +0300 |
Hi all
it's just one example about this
when I wanted to mirror one wordpress-based site (http://media-mera.ru) I have
noticed that the process takes too much time
I have found that the reason is "reply to" links which are provided for every
user comment:
http://media-mera.ru/articles/socially_useful?replytocom=6192#respond
http://media-mera.ru/articles/socially_useful?replytocom=6194#respond
http://media-mera.ru/articles/socially_useful?replytocom=6358#respond
etc
actually these addreses is just the same page
http://media-mera.ru/articles/socially_useful so I have added -R to the
command line
-R '*replytocom=*'
but nothing changed and after some googling I have found the note about wget
behaviour:
-------
http://www.gnu.org/software/wget/manual/html_node/Types-of-Files.html#Types-of-Files
..
Note that these two options do not affect the downloading of html files (as
determined by a ‘.htm’ or ‘.html’ filename
prefix). This behavior may not be desirable for all users, and may be changed
for future versions of Wget.
..
-------
yes, I absolutely agree, it should be changed, judged by wget output the total
downloaded traffic exceeds resulted
saved mirror in 10 times!
PS
wget is running on this site 30 minutes, httrack - only 1,5
PPS
while was writing, I have found even special wordpress plugin which is intended
to reduce traffic of "replytocom" links -
http://wordpress.org/extend/plugins/replytocom-redirector/
--
with best regards
Dmitry Bolshakov
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Bug-wget] a note about "--reject" command line switch does not affect html files,
Dmitry Bolshakov <=