bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Issue with --content-on-error and --convert-links


From: Yousong Zhou
Subject: Re: [Bug-wget] Issue with --content-on-error and --convert-links
Date: Thu, 29 Jan 2015 21:10:29 +0800

Hi Tim

On 27 January 2015 at 17:48, Tim Ruehsen <address@hidden> wrote:
> Hi Yousong,
>
> this patch seems to be incomplete. Do you have a complete patch (e.g. +new
> option, + docs) or are you going to work on it ?
>

That patch was only intended as a ephemeral one to see if it can solve
the issue reported by Joe at the time.  But checking it again, I now
think the patch actually does the right thing.  The reason is that
since those --content-on-error pages are downloaded, then links within
those pages should be converted as specified by --convert-links.
There is no need for a new option for this and the current doc is just
fine.  But I will try adding an test cases for this.

Regards

                yousong


> Tim
>
> On Thursday 16 October 2014 15:24:48 Yousong Zhou wrote:
>> On 13 October 2014 10:25, Joe Hoyle <address@hidden> wrote:
>> > Hi All,
>> >
>> >
>> > I’m having issues using "--convert-links” in conjunction with
>> > "--content-on-error”. Though "--content-on-error” is forcing wget to
>> > download the pages, the links to that “errored” page is not update in
>> > other pages that link to it.
>> >
>> >
>> > This seems to be hinted at in the man page:
>> >
>> >
>> > "Because of this, local browsing works reliably: if a linked file was
>> > downloaded, the link will refer to its local name; if it was not
>> > downloaded, the link will refer to its full Internet address rather than
>> > presenting a broken link. The fact that the former links are converted to
>> > relative links ensures that you can move the downloaded hierarchy to
>> > another directory.”
>> >
>> >
>> > However, it would seem in the case of using —content-on-error it should
>> > ignore this rule and do all the link substation anyhow.
>> >
>> >
>> > If anyone knows if this *should* work then I’d be eager to hear it, or any
>> > other way I can get any 404 pages downloaded and also linked to in the
>> > wget mirror.
>> Currently, wget thought pages with 404 status code were not RETROKF
>> (retrieval was OK) though the 404 page itself was actually downloaded
>> successfully with `--content-on-error` option enabled.  This behaviour
>> is mostly acceptable I guess.  But you can try the attached the patch
>> for the moment.  The other option would be serving the 404 page by
>> manually setting it up with your web server.
>>
>> Regards.
>>
>>                yousong



reply via email to

[Prev in Thread] Current Thread [Next in Thread]