[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Re: How to ignore errors with time stamping
From: |
Andre Majorel |
Subject: |
Re: [Bug-wget] Re: How to ignore errors with time stamping |
Date: |
Fri, 12 Dec 2008 14:21:11 +0100 |
User-agent: |
Mutt/1.5.17+20080114 (2008-01-14) |
On 2008-12-12 12:21 +0100, Morten Lemvigh wrote:
> Andre Majorel wrote:
>
>> To work around that kind of brokenness, Wget would have to ignore
>> the 500 error and fall back on parsing the local file. That should
>> probably not be made the default behaviour, though.
>
> Ah, I see! Thank you for your answer. I guess I'll just have to
> script may way around it then...
Well, Micah may decide to add an option for that but apparently,
Wget is feature-frozen pending release 1.12.
You could try scripting something around the output of
http://www.teaser.fr/~amajorel/misc/htmlhref
I wouldn't swear it's bug-free but it seems to work for me. Since
it derives the base URL from the pathname of the local file, you
want to call it from the same directory you ran wget -r from :
wget -x http://eur-lex.europa.eu/JOHtml.do?uri=OJ:L:2008:321:SOM:DA:HTML &&
htmlhref 'eur-lex.europa.eu/JOHtml.do?uri=OJ:L:2008:321:SOM:DA:HTML' |
xargs wget -x -p -N
In your case, however, hacking Wget to ignore 500 after HEAD could
be the simplest solution. Have you looked at Curl ? Maybe it does
what you want.
--
André Majorel <URL:http://www.teaser.fr/~amajorel/>