lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV cp1252 (shudder)


From: Foteos Macrides
Subject: Re: LYNX-DEV cp1252 (shudder)
Date: Fri, 21 Nov 1997 16:23:45 -0500 (EST)

"Alan J. Flavell" <address@hidden> wrote:
>On Fri, 21 Nov 1997, Foteos Macrides wrote:
>
>>      It occurred to me that the list of conversions may be of general
>> interest, for people who might wish to make them directly in documents
>> generated by FrontPage.  So here they are:
>
>Oh woops...  My reference is:
>
>ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT

        Yes, we're using the unicode.org foo.TXT files for creating the
foo_uni.tbl sources for the Lynx chartrans stuff.


>My own list, for the range x80-x9f, is contained in:
>
>http://www.physics.gla.ac.uk/r2h-extras/rtfunicode.html8
>
>among the materials that accompany my unicode translation package for
>rtftohtml:
>
>http://www.physics.gla.ac.uk/r2h-extras/
>
>Let's see if we can converge ;-)
>
>> &#130;  ->  &#x201a; &sbquo;   SINGLE LOW-9 QUOTATION MARK
>> &#132;  ->  &#x201e; &bdquo;   DOUBLE LOW-9 QUOTATION MARK
>> &#133;  ->  &#x2026; &hellip;  HORIZONTAL ELLIPSIS
>> &#134;  ->  &#x2020; &dagger;  DAGGER
>> &#135;  ->  &#x2021; &Dagger;  DOUBLE DAGGER
>> &#137;  ->  &#x2030; &permil;  PER MILLE SIGN
>> &#139;  ->  &#x2039; &lsaquo;  SINGLE LEFT-POINTING ANGLE QUOTATION MARK
>
>Looks fine so far, but now you seen to slip a notch:
>
>lsquo is x91 which I make out to be 145 , not 144!  and so it goes on...

        Ugh, I did the conversion kludge originally with all hex values,
then went back yesterday to add more comments and change control-range
values to decimal, but blew it when I mentally converted x91 (I should
know better than to rely on my mind :).  The current conversion table
for that kludge is appended, and I'll update lynx271f.zip for that
later today (another lynx271ssleay.zip won't be needed for that).


>> &#144;  ->  &#x2018; &lsquo;   LEFT SINGLE QUOTATION MARK
>> &#145;  ->  &#x2019; &rsquo;   RIGHT SINGLE QUOTATION MARK
>> &#146;  ->  &#x201c; &ldquo;   LEFT DOUBLE QUOTATION MARK
>> &#147;  ->  &#x201d; &rdquo;   RIGHT DOUBLE QUOTATION MARK
>> &#148;  ->  &#x2022; &bull;    BULLET
>> &#149;  ->  &#x2013; &ndash;   EN DASH
>> &#150;  ->  &#x2014; &mdash;   EM DASH
>> &#151;  ->  &#x02dc; &tilde;   SMALL TILDE
>
>...up to small tilde, which is x98 = 152 not 151.
>
>Then 153 is &trade; = &8482; , and so on.
>
>> &#155;  ->  &#x203a; &rsaquo;  SINGLE RIGHT-POINTING ANGLE QUOTATION MARK

        Note that I left out all 8-bit letters, on the assumption
that FrontPage would handle those validly.  I also left out &trade;
intentionally, because it's been in the HTML specs for a long time
now, and I assumed FrontPage would handle it validly as well, but
I'll add it to the kludge update.  I also don't bother to check for
the input charset, because you can't count on that being accurate,
and FrontPage might use them with other codepages (e.g., cp1250) for
which those symbols or punctuation marks are the same.  It would
be easy to add such checks if they prove necessary.

        Oh, also note that the problem with an encasing CENTER or
DIV ALIGN="center" was just a dumb oversight when I initially
restored support for the TABLE-in-PRE kludge, and it's fully restored
in the current lynx271f code set.

                                Fote

=========================================================================
 Foteos Macrides            Worcester Foundation for Biomedical Research
 address@hidden         222 Maple Avenue, Shrewsbury, MA 01545
=========================================================================

        Conversions of invalid numeric (MicroSoft codepage)
        character references to valid Unicode numeric or named
        character reference (names as in HTML 4.0 PR).

INVALID     Numeric   Named             Character
-------     -------- -------   -----------------------------------------
&#1;    ->  &#x263a; (none)    WHITE SMILING FACE
&#130;  ->  &#x201a; &sbquo;   SINGLE LOW-9 QUOTATION MARK
&#132;  ->  &#x201e; &bdquo;   DOUBLE LOW-9 QUOTATION MARK
&#133;  ->  &#x2026; &hellip;  HORIZONTAL ELLIPSIS
&#134;  ->  &#x2020; &dagger;  DAGGER
&#135;  ->  &#x2021; &Dagger;  DOUBLE DAGGER
&#137;  ->  &#x2030; &permil;  PER MILLE SIGN
&#139;  ->  &#x2039; &lsaquo;  SINGLE LEFT-POINTING ANGLE QUOTATION MARK
&#145;  ->  &#x2018; &lsquo;   LEFT SINGLE QUOTATION MARK
&#146;  ->  &#x2019; &rsquo;   RIGHT SINGLE QUOTATION MARK
&#147;  ->  &#x201c; &ldquo;   LEFT DOUBLE QUOTATION MARK
&#148;  ->  &#x201d; &rdquo;   RIGHT DOUBLE QUOTATION MARK
&#149;  ->  &#x2022; &bull;    BULLET
&#150;  ->  &#x2013; &ndash;   EN DASH
&#151;  ->  &#x2014; &mdash;   EM DASH
&#152;  ->  &#x02dc; &tilde;   SMALL TILDE
&#153;  ->  &#x2122; &trade;   TRADE MARK SIGN
&#155;  ->  &#x203a; &rsaquo;  SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]