lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV cp1252 (shudder)


From: David Woolley
Subject: Re: LYNX-DEV cp1252 (shudder)
Date: Tue, 18 Nov 1997 08:18:09 +0000 (GMT)

> A question was raised as to whether these horrible MS-created quasi HTML
> documents that use cp1252 (whether overtly or not), that one finds so

The overt ones are technically legal, but may require an HTML 4.0 doctype,
which they are unlikely to have, if they are only declared in a META
element.  No-one is required to support anything except 8859 as the
transfer character set, though.

> often on the web, could get their curly quotes presented as ordinary
> quotes.  It naturally occurred to me to try this in a development version
> of Lynx, to see if this is a supported combination.

Most of these curly quotes from FrontPage are undefined entities.
The HTML decode rules are that the HTTP character set is resolved
to canonical form (originally 8859, but Unicode for HTML 4) before
the entities are processed.  Numeric entities are then interpreted as
code points in the canonical character set.  145 and 146, etc. are not
defined code points in Unicode.  8859 is a strict subset of Unicode,
at least for non-control characters.

There are valid entity codes for curly quotes, but they are well above 255.
Literal CP 1252 curly quotes are probably OK, but all the FrontPage documents
I've seen use numeric entities.

You even get  quite often, which is even more PC centric.
;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]