lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV cp1252 (shudder)


From: Foteos Macrides
Subject: Re: LYNX-DEV cp1252 (shudder)
Date: Wed, 19 Nov 1997 19:58:25 -0500 (EST)

"Alan J. Flavell" <address@hidden> wrote:
>[...]
>> Most of these curly quotes from FrontPage are undefined entities.
>> The HTML decode rules are that the HTTP character set is resolved
>> to canonical form (originally 8859, but Unicode for HTML 4) before
>> the entities are processed.  
>
>I'm sure this is all common knowledge to those doing the character tables. 
>My point was this: Lynx already uses "approximations" for representing
>characters that it understands but are not in the repertoire of the
>selected output (terminal Charset) encoding.  Do not confuse the
>documents' charset with the terminal Charset, they are very different
>things.  And NCRefs are different again, in theory (and maybe it is
>correct for Lynx to point those up as illegal; I was merely making a
>practical suggestion.  People rarely accuse me of lack of pedantry!). 

        I certainly have never accused you of that. :-)


>> Numeric entities are then interpreted as
>> code points in the canonical character set.  145 and 146, etc. are not
>> defined code points in Unicode.  
>
>Of course.  Nevertheless, MS software creates these illegal and
>meaningless representations, and it would be obtuse to claim that we don't
>know what they intend by it.  I wasn't asking for an explanation of what
>they do or don't mean in HTML, but making a suggestion for a practical
>way of dealing with them, either as 8-bit characters - which would be
>perfectly legal if charset=cp1252, in fact; or as NCRefs - which is, we
>agree, invalid HTML, but we still may discuss how to deal with it, may
>we? 

        Of course you may!  You're a card-carry member of the Lynx User
Community.

        Fortunately, I'm no longer a currently active Lynx developer,
and can just go ahead and do it in the lynx271f code set.  I wasn't
sure whether to convert &#1; to the white or black smiling face Unicode
character, so I made it the white, though with most Display Character
Sets it will end up as ASCII art.  :-)

        Our gopher server indeed is now being barraged with fetch
attempts, many of them with .uk or .jp domains, and obviously not
getting it for use with the SSL hooks, plus being frustrated by the
occassional time-outs due to the still poor connectivity in Central
Mass.  So, with all due respect to Henry's pickle parable, I put
lynx271f.zip in:

        http://www.slcc.edu/lynx/fote/patches/

If you try it and find any problems with the conversions from FrontPage
garbage to valid Unicode, be sure to include URLs with the bug reports.

                                Fote

=========================================================================
 Foteos Macrides            Worcester Foundation for Biomedical Research
 address@hidden         222 Maple Avenue, Shrewsbury, MA 01545
=========================================================================
;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]