lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV: development vs 2.7.2


From: Leonid Pauzner
Subject: Re: LYNX-DEV: development vs 2.7.2
Date: Sun, 15 Feb 1998 15:00:28 +0300 (MSK)

>      * From: "T.E.Dickey" <address@hidden>
>      * Date: Sat, 14 Feb 1998 19:42:16 -0500 (EST)
>      * Reply-To: address@hidden
>      * Sender: address@hidden
>      _________________________________________________________________
>
> 1998-2-14
> ---------
>
> For ac #114, I reviewed the differences between 2.7.2 and the development
> versions for the files having the largest differences.  These contain some of
> the changes (but not all) that Fote disagrees with.  The larger chunks of
> differences are Klaus's (mine are smaller, and usually isolated, and limited
> to build/portability issues).  I'm going down the list in decreasing order
> by number of differences.
>

> Here's what I see:


> HTMLDTD.c
> ---------
>         The extra_entities[] table has different codes, mainly because LP just
>         changed it to reflect a newer source.

Yes, I did it.
It is obvious and easy maintainable part,
I propose to move the table to a separate file (say, ENTITIES.c),
for simplification.


>
> LYCharUtils.c
> -------------
>
>         development code has a big chunk that's testing explicit character
>         set names, around line 2800 in LYHandleMETA(), much more than in
>         2.7.2 (couldn't this be table-driven?).

There is a comment "fall through to old behavior" there.
This chunk should be carefully corrected, _seems_ in the accordance
with the charset list in LYCharSet.c which is currently the same as in 2.7.2
but "Other ISO Latin" removed.


>
> SGML.c
> ------
>
>         Macro IncludesLatin1Enc is used in more places (applying the
>         development version's logic in cases that 2.7.2 does not).


Obsolete. It is used
(1) after "no special unicodes" and after UCTransUniChar to check again
if a chrtrans/iso01_uni.tbl have a wrong crossmapping for special characters.
(search for the comment "Would only happen")
It really check only superset_of_lat1, but should also apply
to all iso-8859-* and windows-125* (following this logic).
Should be removed (and also in lycharutils.c) !
Just wasting of space.
Anyway, people responsible if they are edit by hand *_uni.tbl too mach,
but since tables are driven from ftp.unicode.org we should not worry about.

(2) "standard" old-style behavoir (found in sgml.c, htplain.c, etc.) where
under some sircumstance (CJK related?) iso-latin-1 characters from 160-255
converted back to named entity, exactly:

>         2.7.2 follows calls on HTMLGetEntityName() with a binary search into
>         context->dtd->entity_names, and outputs the corresponding entry from

I don'd know the (historical) reasons and how to remove it at the moment.
Leave as is until we have any idea.


>         A number of places in 2.7.2 handle special cases for hyphens and 
> spaces
>         (e.g., in put_special_unicodes, handle_entity,

read my comments in function "put_special_unicodes";
we may try to limit "A number of places" via invoking this function.
I correct that (partly) in dev code recently and I am sure in.



>         2.7.2 has a chunk in handle_entity() beginning with comment "If the
>         value is greater than 255".  There's a similar chunk in 
> SGML_character(

I don't know, but seems to be unused and no problem.

>
>         2.7.2 follows calls on HTMLGetEntityName() with a binary search into
>         context->dtd->entity_names, and outputs the corresponding entry from
>         LYCharSets, while the development version uses handle_entity() for
>         outputting the entity-name.  (This is in more than one place).

"standard" obsolete behavior.

>
>         There's a big chunk in 2.7.2's SGML_character() which is not used
>         in the development version, dealing with Frontpage.

This is the case we have windows-1252 characters comes silently
as iso-latin-1 (windows-1252 include all iso-latin-1 chars _exactly_
but also include something at 130-159). This is a workaround
if FrontPage "forgot" to include correct charset header.
Turned on in 2.7.2 and off in dev code.
I don't know which is better,
but nothing dangerous here.

>
>         A small chunk in SGML_character() to support codes 8194, 8195, 8201,
>         not in 2.7.2

remove it (and adjust the comment under (1)).
it should be processed via unicode stuff, and not under CJK-related.
Recently I removed "special cases" for "trade", "mdash/ndash" etc. -
they are passed through unicode anyway and nobody complain.


Leonid.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]