lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lynx-dev] rendering — (0x97)


From: Thomas Dickey
Subject: Re: [Lynx-dev] rendering — (0x97)
Date: Sun, 28 Jun 2020 15:02:59 -0400 (EDT)


----- Original Message -----
| From: "Thorsten Glaser" <tg@mirbsd.de>
| Cc: "lynx-dev" <lynx-dev@nongnu.org>
| Sent: Sunday, June 28, 2020 1:40:48 PM
| Subject: Re: [Lynx-dev] rendering &#151; (0x97)

| Thomas Dickey dixit:
| 
|>but in the meantime, the html5 crowd declared that iso-8859-1 is
|>identical to cp1252
| 
| WHAT‽
| 
| I knew they were crazy, but… like THAT?

Here's something relevant:

https://encoding.spec.whatwg.org/#names-and-labels

I seem to recall reading that in one of those pages summarizing changes for 
html5.

On the other hand, it might be one of those "facts" created in Wikipedia 
(there's a lot of that).
And even if I saw it some other place, Wikipedia might still be the ultimate 
source.

Looking there, I see it evolving since

https://en.wikipedia.org/w/index.php?title=Windows-1252&type=revision&diff=267905312&oldid=262016711
https://en.wikipedia.org/w/index.php?title=Windows-1252&type=revision&diff=285015046&oldid=285011908

with the second edit referring to

https://web.archive.org/web/20090417231914/http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html

Here's the source for the first edit:

https://web.archive.org/web/20090204094727/http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html

See "8.2.2.2 Character encoding requirements", which (seems familiar) says that 
ISO-8859-1 should be treated as if it were CP1252.

Move forward to 2012, and the wording is amended

https://web.archive.org/web/20120930155353/http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html

and going to 2013, I don't see it anymore.

That is, I don't see it in whatwg at that point.  But Wikipedia's been updated, 
and so has whatwq...

As of today, here's the current page:

https://en.wikipedia.org/w/index.php?title=Windows-1252&oldid=964485118

which says

This is now standard behavior in the HTML5 specification, which requires that 
documents advertised as ISO-8859-1 actually be parsed with the Windows-1252 
encoding.[5]

[5] "Encoding". WHATWG. 27 January 2015. sec. 5.2 Names and labels. Archived 
from the original on 4 February 2015. Retrieved 4 February 2015.

That is, it points to something that we can read on Internet Archive:

https://web.archive.org/web/20150204174315/https://encoding.spec.whatwg.org/#names-and-labels

...and that page does say (in effect) that ISO-8859-1 and several other 
charsets:

"ansi_x3.4-1968"
"ascii"
"cp1252"
"cp819"
"csisolatin1"
"ibm819"
"iso-8859-1"
"iso-ir-100"
"iso8859-1"
"iso88591"
"iso_8859-1"
"iso_8859-1:1987"
"l1"
"latin1"
"us-ascii"
"windows-1252"
"x-cp1252"

are to be interpreted as CP1252.  The current page gives the same information:

https://web.archive.org/web/20200613144751/https://encoding.spec.whatwg.org/
 
| That being said this still is UTF-8, not ISO-8859-1…


-- 
Thomas E. Dickey <dickey@invisible-island.net>
http://invisible-island.net
ftp://ftp.invisible-island.net



reply via email to

[Prev in Thread] Current Thread [Next in Thread]