bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Tilde issue with recursive download when IRI is enabled a


From: William Prescott
Subject: Re: [Bug-wget] Tilde issue with recursive download when IRI is enabled and a page uses Shift JIS
Date: Fri, 17 Feb 2017 03:34:20 -0500

Hello,

I was just thinking about this again. My initial message was incorrect,
since I had expected Wget to work with invalid input. However, I still have
a nagging feeling that something odd goes on with relative links, and I'd
rather hear that it's intended than to let it slip by as a bug just because
I didn't explain it well enough.

The last message I sent mentioned this a little bit:
> I would also like to note that, even when the the document's links don't
> contain a tilde, Wget will still fail to fetch the pages as long as there
> is a tilde in the URL the Wget was called with.

Let's consider the (UTF-8) URL "http://example.com/~foo/bar.html";
bar.html is Shift_JIS encoded and contains:
<meta http-equiv="Content-Type" content="text/html;charset=Shift_JIS">
<a href="baz.html">Baz</a>

(this time, bar.html is perfectly valid Shift_JIS and doesn't have a tilde)

A recursive download will fail, because the relative URL appears to get
processed as
sjis_to_utf8(utf8_to_sjis("http://example.com/~foo/";) + sjis("baz.html"))
resulting in
http://example.com/‾foo/baz.html

I would have expected
utf8("http://example.com/~foo/";) + sjis_to_utf8("baz.html")
resulting in
http://example.com/~foo/baz.html

Best regards,
William Prescott



reply via email to

[Prev in Thread] Current Thread [Next in Thread]