Re: [Bug-wget] Support non-ASCII URLs

From: Eli Zaretskii
Subject: Re: [Bug-wget] Support non-ASCII URLs
Date: Sat, 19 Dec 2015 14:11:20 +0200

Tim sent me the tarball and the log off-list (thanks!).  I didn't yet
try to build Wget, but just looking at the test, I guess I don't
understand its idea.  It has an index.html page that's encoded in
ISO-8859-15, but Wget is invoked with --remote-encoding=iso-8859-1,
and the URLs themselves in "my %urls" are all encoded in UTF-8.  How's
this supposed to work?

Also, I'm not following the logic of overriding Content-type by the
remote encoding: p1_fran%C3%A7ais.html states "charset=UTF-8", but
includes a link encoded in ISO-8859-1, and the test seems to expect
Wget to use the remote encoding in preference to what "charset=" says.
Does the remote encoding override the encoding for the _contents_ of
the URL, not just for the URL itself?  That seems to make little sense
to me: the contents and the name can legitimately be encoded
differently, I think.

I guess I lack some basic info about what Wget is supposed to do in
these tricky situations, and how.  Can you help me understand that?
The manual doesn't seem to be very details on what's expected here.


