bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Support non-ASCII URLs


From: Eli Zaretskii
Subject: Re: [Bug-wget] Support non-ASCII URLs
Date: Sun, 20 Dec 2015 19:23:05 +0200

> From: Tim Rühsen <address@hidden>
> Date: Sun, 20 Dec 2015 16:26:20 +0100
> 
> > Tim sent me the tarball and the log off-list (thanks!).  I didn't yet
> > try to build Wget, but just looking at the test, I guess I don't
> > understand its idea.  It has an index.html page that's encoded in
> > ISO-8859-15, but Wget is invoked with --remote-encoding=iso-8859-1,
> > and the URLs themselves in "my %urls" are all encoded in UTF-8.  How's
> > this supposed to work?
> 
> Regarding the wget man page, --remote-encoding just sets the *default* server 
> encoding. This only comes into play when the HTTP header does not contain a 
> Content-type with charset set *and* the HTML page does not contain a <meta 
> http-equiv="Content-Type" with 'content=... charset=...'.

Makes sense.

> 'index.html' in this test is correctly having a meta tag with charset=utf-8 
> and the URLs encoded in utf-8.

That's not what I see: index.html says

  "Content-type" => "text/html; charset=ISO-8859-15"

and its contents indeed has URLs encoded in ISO-8859-15.

> > Also, I'm not following the logic of overriding Content-type by the
> > remote encoding: p1_fran%C3%A7ais.html states "charset=UTF-8", but
> > includes a link encoded in ISO-8859-1, and the test seems to expect
> > Wget to use the remote encoding in preference to what "charset=" says.
> 
> Either the test is wrong here or the man page. I would say the man page 
> should 
> be correct here - it makes the most sense to me. In this case the test is 
> wrong, also the comment.

OK.

> > Does the remote encoding override the encoding for the _contents_ of
> > the URL, not just for the URL itself?  That seems to make little sense
> > to me: the contents and the name can legitimately be encoded
> > differently, I think.
> 
> The filenames in %expected_downloaded_files depend on --local-encoding.
> Since this is not given on the command line, this test will behave 
> differently 
> with different settings for LC_ALL ('make check' use LC_ALL=C, contrib/check-
> hard will also 'make check' with turkish UTF-8 locale).
> 
> To fix the test, we should use --local-encoding to some kind of UTF-8 locale 
> (or something else, but than we have to fix the filenames regarding that 
> locale).

But then what would be the point of repeating the test with the
turkish locale? verify that when given --local-encoding the locale is
ignored?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]