lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Lynx character entity references fix


From: Klaus Weide
Subject: Re: lynx-dev Lynx character entity references fix
Date: Fri, 12 Mar 1999 10:27:48 -0600 (CST)

On Fri, 12 Mar 1999, Leonid Pauzner wrote:
> 12-Mar-99 00:54 Klaus Weide wrote:
> > On Thu, 11 Mar 1999, Leonid Pauzner wrote:
> 
> >> OK, changing of "assume charset" for unlabelled document gives the folowing
> >> (grep UC_MapGN from trace log):
> >>
> >> UC_MapGN: Using 1 <- 26 (windows-1251)
> >> UC_MapGN: Using 1 <- 1 (iso-8859-15)
> >> UC_MapGN: Using 2 <- 2 (cp850)
> >> UC_MapGN: Using 1 <- 3 (windows-1252)
> >> UC_MapGN: Using 2 <- 4 (cp437)
> >> UC_MapGN: Using 1 <- 5 (dec-mcs)
> >> UC_MapGN: Using 2 <- 6 (macintosh)
> >> UC_MapGN: Using 1 <- 7 (next)
> >> UC_MapGN: Using 2 <- 8 (hp-roman8)
> 
> There there two more charsets not shown above: iso-8859-1 and us-ascii
> (before iso-8859-15) - apparently constant slot #0.

I'm mot sure why us-ascii doesn't show up in the TRACE - possibly 
because 8-bit characters get rejected already in SGML.c, so it never comes
this far.  (speculation...)

> I'm thinking on undo some UCCanTranslate* changes to support UChndl >= 0 back,
> this handler is a couple of bytes and can be removed at the last stage.

Agreed.  If at some point all "old style" tables are gone, then UChndl == -1
cannot occur any more (It may still make sense to have some [other?] flag to
say "we can translate *to* this, but not *from* this).

> > In general your changes seem to aim at simplifying things (with the
> > final goal to get completely rid of "old" stuff?) and and at making
> > things clearer.  I think using UChndl = -1 to mean something else than
> > it used to doesn't make things clearer though.
> 
> > I leave it to you to find the best way (and reserve right to complain...)
> 
> The real simplification may be #ifdef'ing some heavy code
> that deal with "old" style usage (in SGML.c, HTPlain.c, LYCharUtils.c (Uh!),
> and at the last stage - from HTMLDTD.c, LYCharSets.c, ...)
> It is a "bloating binary" item and also a problem of maintaining
> such ungomogenouse piece of code in general.

Yes, there is quite some duplication there.

I think LYCharUtils.c is not so bad, although you find it "somehow ugly". :)
There may be less baggage there than in SGML.c, HTPlain.c.

[...] 
> No problem - it may be left #ifdef'ed in the code
> (but since it will not be used it will not be actively tested/maintained
> to a greater chance became broken in future by occasional lynx changes, yes).

Same as with other "dead code" removal by #ifdef'ing.

> p.s. The real problem I see is a limited capacity of space for lynx special
> characters like HT_NON_BREAK_SPACE, HT_EM_SPACE, etc. (see GridText.h),
> which mapped to < 32 area: we cannot add more, say HT_EN_SPACE
> (and we probably have Vietnamese implementation already broken,
> though nobody interested seems). Indirect usage of "old" entities translation
> may effectively solve the problem, but I am not sure.

The best (most general) solution to that would be to feed Unicode values
(instead of chars) to GridText, then there is nearly unlimited space
for private regions.  I.e. translate Unicode -> display character set
as late as possible, but that would mean that also HTML.c has to deal
with text as Unicode values instead of chars.  Eventually that would be
cleaner, but not trivial to change (especially not breaking CJK and
"Transparent").

One advantage of doing  Unicode -> display  is that a larger glyph
repertoire could be used if the terminal supports some way for it -
at least dec graphics characters that are in addition to the normally
printable output chars, (by switching to curses alternate-character-set)
or possibly VGA fonts of 512 characters (some are available for linux
console).


   Klaus

reply via email to

[Prev in Thread] Current Thread [Next in Thread]