[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] [bug #20808] -R should reject files _before_ downloading them
From: |
Oleksandr Gavenko |
Subject: |
[Bug-wget] [bug #20808] -R should reject files _before_ downloading them |
Date: |
Thu, 20 Aug 2015 20:56:38 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0 Iceweasel/38.1.0 |
Follow-up Comment #12, bug #20808 (project wget):
I try to retrieve specific replays from saved game storage
http://replays.wesnoth.org/1.12/
This site just usual directory/file list.
As data grouped per day for 2 year period there are a lot of subdirectories.
I try to get interesting replays by (see
http://forums.wesnoth.org/viewtopic.php?p=588686#p588686 ):
wget -e 'robots=off' -nc -c -np -A 'Scrolling_Survival_Turn_1??_*.bz2' -A
index.html -r http://replays.wesnoth.org/1.12/
but each subdirectory have links to sort table data on page (query string) and
for each page (which is 2 years*365 days) it try to download things that
rejected.
It take too long time to wait (even given that wget reuse connections) for
wget do useless job.
I quickly solve task with by manually scanning index.html files, just get them
by wget (--level=1 do job for limiting amount of processing time):
$ wget -r -np -A index.html --level=1 http://replays.wesnoth.org/1.12/
and retrieve interested files:
$ find . -type f -name index.html | while read f; do p=${f#./};
p=http://${p%index.html}; command grep -o
'href="Scrolling_Survival_Turn_[5-9]._[^"]*.bz2' $f | while read s; do
s=${s#href='"'}; wget $p$s; done; done
It is nice to have ability to list what links to follow, when processed HTML
files.
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?20808>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Bug-wget] [bug #20808] -R should reject files _before_ downloading them,
Oleksandr Gavenko <=