lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Lynx character entity references fix


From: Leonid Pauzner
Subject: Re: lynx-dev Lynx character entity references fix
Date: Fri, 5 Mar 1999 16:23:56 +0300 (MSK)

>      * Subject: lynx-dev Lynx character entity references fix
>      * From: Jacob Poon <address@hidden>
>      * Date: Thu, 4 Mar 1999 16:38:43 -0500
>      * Reply-To: address@hidden
>      * Sender: address@hidden
>      _________________________________________________________________


> This patch does the following:

>         HTML 4.0 compliance:
>         - Added support for Euro currency symbol.
>         - Fixed duplicated &loz; definitions.

>         Fixes:
>         - Fixed some typos in the old references. (fixed: b.delta)
Thanks, I'm now working on old-style entities code, will integrate your fix.

But probably a wrong point taken:
the table much wider than HTML 4.0,
see Lynx /test/sgml.html (both rendered and as source) -
it have sometimes up to four synonyms while HTML4.0 have 1:1 mapping.
Few old references were added for compatibility with old lynx (2.8 and before)
are from HTMLDTD.c entities[] table, nothing similar to b.greekSomething
(nor in in HTML 4.0 nor is rendered by lynx also)...

We should probably decide whether we want lynx act strictly as HTML 4.0
and reject everything else or keep as much as possible. Any vote?

quote from entitiess.h:

# Author: John Cowan <address@hidden>
# Date: 25 July 1997
#
# The following table maps SGML character entities from various
# public sets (namely, ISOamsa, ISOamsb, ISOamsc, ISOamsn, ISOamso,
# ISOamsr, ISObox, ISOcyr1, ISOcyr2, ISOdia, ISOgrk1, ISOgrk2,
# ISOgrk3, ISOgrk4, ISOlat1, ISOlat2, ISOnum, ISOpub, ISOtech,
# HTMLspecial, HTMLsymbol) to corresponding Unicode characters.

#       Column 2: SGML public entity set
#       Column 3: Unicode 2.0 character code
#       Column 4: Unicode 2.0 character name (UPPER CASE)
# Entries which don't have Unicode equivalents have "0x????"
# in Column 3 and a lower case description (from the public entity
# set DTD) in Column 4.  The mapping is not reversible, because many
# distinctions are unified away in Unicode, particularly between
# mathematical symbols.
#
# The table is sorted case-blind by SGML character entity name.
#
# The contents of this table are drawn from various sources, and
# are in the public domain.
#
########################

   We just sort it and move column 2 away (line too long, sorry;
   look at sgml.html in test/ directory for details).
   Also we add a few (obsolete) synonyms:
   "brkbar"  for "brvbar" 0x00A6
   "emdash"  for "mdash" 0x2014
   "endash"  for "ndash" 0x2013
   "hibar"  for "macr" 0x00AF
   for exact compatibility with entities[] and previous bevavior.
   BTW, lots of synonyms found in this table, we shouldn't worry about...
*/

> I don't know if the table should contain both versions of &loz;
> reference.  But for HTML 4.0 compliance, the U+2727 one is commented,
> unless there are reasonable objections for it.

> *** WWW/Library/Implementation/entities.h       Mon Feb 22 00:26:27 1999
> --- WWW/Library/Implementation/entities.h.new   Thu Mar  4 16:37:42 1999
> ***************
> *** 69,74 ****
> --- 69,95 ----
>      BTW, lots of synonyms found in this table, we shouldn't worry about...
>   */

> + /*
> + Modified by Jacob Poon <address@hidden>
> +
> + This table is modified improve support of HTML 4.0 character entity 
> reference
> s,
> + including Euro symbol support.
> +
> + Known issues:
> +
> + The original table includes two different definitions of &loz; reference.
> + Since HTML 4.0 only uses U+25CA, the U+2727 definition is commented out,
> + until there is a good reason to put it back in.
> +
> + At the end of the table, there are several unnumbered, commented references.
> + These are not defined in HTML 4.0, and will remain so until they are defined
> + in future SGML/HTML standards.
> +
> + The support for obsolete references are for backwards compatibility only.  
> Ne
> w
> + SGML/HTML documents should not depend on these references just because Lynx 
> c
> an
> + display them.
> + */
> +
>   static CONST UC_entity_info unicode_entities[] = {
>     {"AElig",   0x00C6},  /* LATIN CAPITAL LETTER AE                       */
>     {"Aacgr",   0x0386},  /* GREEK CAPITAL LETTER ALPHA WITH TONOS         */
> ***************
> *** 326,332 ****
>     {"b.alpha", 0x03B1},  /* GREEK SMALL LETTER ALPHA                      */
>     {"b.beta",  0x03B2},  /* GREEK SMALL LETTER BETA                       */
>     {"b.chi",   0x03C7},  /* GREEK SMALL LETTER CHI                        */
> !   {"b.delta", 0x03B3},  /* GREEK SMALL LETTER GAMMA                      */
>     {"b.epsi",  0x03B5},  /* GREEK SMALL LETTER EPSILON                    */
>     {"b.epsis", 0x03B5},  /* GREEK SMALL LETTER EPSILON                    */
>     {"b.epsiv", 0x03B5},  /* GREEK SMALL LETTER EPSILON                    */
> --- 347,353 ----
>     {"b.alpha", 0x03B1},  /* GREEK SMALL LETTER ALPHA                      */
>     {"b.beta",  0x03B2},  /* GREEK SMALL LETTER BETA                       */
>     {"b.chi",   0x03C7},  /* GREEK SMALL LETTER CHI                        */
> !   {"b.delta", 0x03B4},  /* GREEK SMALL LETTER DELTA                      */
>     {"b.epsi",  0x03B5},  /* GREEK SMALL LETTER EPSILON                    */
>     {"b.epsis", 0x03B5},  /* GREEK SMALL LETTER EPSILON                    */
>     {"b.epsiv", 0x03B5},  /* GREEK SMALL LETTER EPSILON                    */
> ***************
> *** 532,537 ****
> --- 553,559 ----
>     {"eta",     0x03B7},  /* GREEK SMALL LETTER ETA                        */
>     {"eth",     0x00F0},  /* LATIN SMALL LETTER ETH                        */
>     {"euml",    0x00EB},  /* LATIN SMALL LETTER E WITH DIAERESIS           */
> +   {"euro",    0x20AC},  /* EURO SIGN                                     */
>     {"excl",    0x0021},  /* EXCLAMATION MARK                              */
>     {"exist",   0x2203},  /* THERE EXISTS                                  */
>     {"fcy",     0x0444},  /* CYRILLIC SMALL LETTER EF                      */
> ***************
> *** 679,685 ****
>     {"lowast",  0x2217},  /* ASTERISK OPERATOR                             */
>     {"lowbar",  0x005F},  /* LOW LINE                                      */
>     {"loz",     0x25CA},  /* LOZENGE                                       */
> !   {"loz",     0x2727},  /* WHITE FOUR POINTED STAR                       */
>     {"lozf",    0x2726},  /* BLACK FOUR POINTED STAR                       */
>     {"lpar",    0x0028},  /* LEFT PARENTHESIS                              */
>     {"lrarr2",  0x21C6},  /* LEFTWARDS ARROW OVER RIGHTWARDS ARROW         */
> --- 701,708 ----
>     {"lowast",  0x2217},  /* ASTERISK OPERATOR                             */
>     {"lowbar",  0x005F},  /* LOW LINE                                      */
>     {"loz",     0x25CA},  /* LOZENGE                                       */
> ! /*  {"loz",   0x2727},  WHITE FOUR POINTED STAR                          */
> !  /* Warning: Duplicated &loz; entry.  HTML 4,0 defines it as U+25CA. */
>     {"lozf",    0x2726},  /* BLACK FOUR POINTED STAR                       */
>     {"lpar",    0x0028},  /* LEFT PARENTHESIS                              */
>     {"lrarr2",  0x21C6},  /* LEFTWARDS ARROW OVER RIGHTWARDS ARROW         */
>      _________________________________________________________________

>      * Prev: lynx-dev [PATCH][dev19] Malloc checks, duplicate #includes,
>        useless code
>      * Next: Re: lynx-dev lynx2.8.2dev.18
>      * Index(es):
>           + Main
>           + Thread
>      _________________________________________________________________

>    Lynx mailing list archives

>    [FLORA HOME] [LYNX Home]





reply via email to

[Prev in Thread] Current Thread [Next in Thread]