[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: LYNX-DEV: development vs 2.7.2
From: |
Leonid Pauzner |
Subject: |
Re: LYNX-DEV: development vs 2.7.2 |
Date: |
Sun, 15 Feb 1998 15:00:28 +0300 (MSK) |
> * From: "T.E.Dickey" <address@hidden>
> * Date: Sat, 14 Feb 1998 19:42:16 -0500 (EST)
> * Reply-To: address@hidden
> * Sender: address@hidden
> _________________________________________________________________
>
> 1998-2-14
> ---------
>
> For ac #114, I reviewed the differences between 2.7.2 and the development
> versions for the files having the largest differences. These contain some of
> the changes (but not all) that Fote disagrees with. The larger chunks of
> differences are Klaus's (mine are smaller, and usually isolated, and limited
> to build/portability issues). I'm going down the list in decreasing order
> by number of differences.
>
> Here's what I see:
> HTMLDTD.c
> ---------
> The extra_entities[] table has different codes, mainly because LP just
> changed it to reflect a newer source.
Yes, I did it.
It is obvious and easy maintainable part,
I propose to move the table to a separate file (say, ENTITIES.c),
for simplification.
>
> LYCharUtils.c
> -------------
>
> development code has a big chunk that's testing explicit character
> set names, around line 2800 in LYHandleMETA(), much more than in
> 2.7.2 (couldn't this be table-driven?).
There is a comment "fall through to old behavior" there.
This chunk should be carefully corrected, _seems_ in the accordance
with the charset list in LYCharSet.c which is currently the same as in 2.7.2
but "Other ISO Latin" removed.
>
> SGML.c
> ------
>
> Macro IncludesLatin1Enc is used in more places (applying the
> development version's logic in cases that 2.7.2 does not).
Obsolete. It is used
(1) after "no special unicodes" and after UCTransUniChar to check again
if a chrtrans/iso01_uni.tbl have a wrong crossmapping for special characters.
(search for the comment "Would only happen")
It really check only superset_of_lat1, but should also apply
to all iso-8859-* and windows-125* (following this logic).
Should be removed (and also in lycharutils.c) !
Just wasting of space.
Anyway, people responsible if they are edit by hand *_uni.tbl too mach,
but since tables are driven from ftp.unicode.org we should not worry about.
(2) "standard" old-style behavoir (found in sgml.c, htplain.c, etc.) where
under some sircumstance (CJK related?) iso-latin-1 characters from 160-255
converted back to named entity, exactly:
> 2.7.2 follows calls on HTMLGetEntityName() with a binary search into
> context->dtd->entity_names, and outputs the corresponding entry from
I don'd know the (historical) reasons and how to remove it at the moment.
Leave as is until we have any idea.
> A number of places in 2.7.2 handle special cases for hyphens and
> spaces
> (e.g., in put_special_unicodes, handle_entity,
read my comments in function "put_special_unicodes";
we may try to limit "A number of places" via invoking this function.
I correct that (partly) in dev code recently and I am sure in.
> 2.7.2 has a chunk in handle_entity() beginning with comment "If the
> value is greater than 255". There's a similar chunk in
> SGML_character(
I don't know, but seems to be unused and no problem.
>
> 2.7.2 follows calls on HTMLGetEntityName() with a binary search into
> context->dtd->entity_names, and outputs the corresponding entry from
> LYCharSets, while the development version uses handle_entity() for
> outputting the entity-name. (This is in more than one place).
"standard" obsolete behavior.
>
> There's a big chunk in 2.7.2's SGML_character() which is not used
> in the development version, dealing with Frontpage.
This is the case we have windows-1252 characters comes silently
as iso-latin-1 (windows-1252 include all iso-latin-1 chars _exactly_
but also include something at 130-159). This is a workaround
if FrontPage "forgot" to include correct charset header.
Turned on in 2.7.2 and off in dev code.
I don't know which is better,
but nothing dangerous here.
>
> A small chunk in SGML_character() to support codes 8194, 8195, 8201,
> not in 2.7.2
remove it (and adjust the comment under (1)).
it should be processed via unicode stuff, and not under CJK-related.
Recently I removed "special cases" for "trade", "mdash/ndash" etc. -
they are passed through unicode anyway and nobody complain.
Leonid.
- Re: LYNX-DEV: development vs 2.7.2,
Leonid Pauzner <=