lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Lynx character entity references fix


From: Leonid Pauzner
Subject: Re: lynx-dev Lynx character entity references fix
Date: Sun, 7 Mar 1999 20:01:43 +0300 (MSK)

7-Mar-99 08:22 Klaus Weide wrote:
> On Fri, 5 Mar 1999, Leonid Pauzner wrote:

>> >      * From: Jacob Poon <address@hidden>
>>
>> > This patch does the following:
>>
>> >         HTML 4.0 compliance:
>> >         - Added support for Euro currency symbol.
>> >         - Fixed duplicated &loz; definitions.
>>
>> >         Fixes:
>> >         - Fixed some typos in the old references. (fixed: b.delta)
>> Thanks, I'm now working on old-style entities code, will integrate your fix.
>>
>> But probably a wrong point taken:
>> the table much wider than HTML 4.0,
>> see Lynx /test/sgml.html (both rendered and as source) -
>> it have sometimes up to four synonyms while HTML4.0 have 1:1 mapping.
>> Few old references were added for compatibility with old lynx (2.8 and 
>> before)
>> are from HTMLDTD.c entities[] table, nothing similar to b.greekSomething
>> (nor in in HTML 4.0 nor is rendered by lynx also)...

> It seems none of the b.something entities can work, because the dot
> terminates entity parsing.  Are these even *meant* to be used in HTML
> (of any version)?  Does Lynx use the wrong syntax for recognizing
> character entities, *are* dots allowed in their names?
Dots are not allowed currently within lynx entities,
but seems there is no restriction in SGML and they are registered with ISO.

>> We should probably decide whether we want lynx act strictly as HTML 4.0
>> and reject everything else or keep as much as possible. Any vote?

Done. The second table in entities.h is strict HTML4.0 entities list
(252 entries mapped to unicode 1:1), it is #ifdef'ed
with ENTITIES_HTML40_ONLY (a better name?) and may be used
_instead_ of the current table (~995 entries without reverse mapping).
The smaller table useful for page validation while larger may be safer
for future standards - who knows?

> No vote, but some points to consider:

>   - entities with dots don't work as noted above
>   - the more unnecessary names *are* recognized, the higher is the
>     chance of confusion with "&something" within URLs (although the
>     workaround of skipping entity strings if followed by '=' seems
>     to work well - but it's still a workaround)
>   - 6 ways to say "GREEK SMALL LETTER EPSILON" just seems too much;
>     apart from that, is it definitively clear that they really *are*
>     the same character? (Are the variations variant glyph shapes of
>     "the same" character, or does Unicode more than one code point for
>     them?  As for example is the case for THETA SYMBOL (&thetasym;/&thetav;)
>     vs. CAPITAL/SMALL LETTER THETA.

There are "Greek letters" and "Greek symbols" within Unicode at different
code positions but with similar shapes (probably the same?),
we should live with it...  (I guess this is because greek letters being
in math formulae have a different kerning than being in text - see TeX/LaTeX,
but this have no meaning for character-cell displays lynx dealing with.)

>    Klaus



reply via email to

[Prev in Thread] Current Thread [Next in Thread]