lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV new release?


From: Klaus Weide
Subject: Re: LYNX-DEV new release?
Date: Sat, 16 Aug 1997 21:56:22 -0500 (CDT)

On Sat, 16 Aug 1997, Foteos Macrides wrote:

> Klaus Weide <address@hidden> wrote:
> >[...]
> >- Review and change string translation stuff in LYCharUtils.c, to finally
> >  make all kinds of charsets work in attributes.  Pain ita.
> >  Probably means lots of changes throughout HTML.c, possibly elsewhere.
> 
>       I did that long ago in the fotemods.  Did you find a problem with
> it, or just not notice it?

The change of function LYUnEscapeEntities from less than 500 lines to
more than 1000 lines did not go unnoticed.  I admit I am intimidated
by the pure size and number of levels of it.  So I have sucessfully
managed to avoid looking at it in detail, so far.  Of course I can't
really complain, I think my changes in SGML.c for chartrans don't look
much better to someone else...  The function seems to work fine for what
it does, as far as I have tested it (although not the UTF-8 output) on
<http://www.tezcat.com/~kweide/lynx-chartrans/test/ALT-test.iso8859-2.L2html>.

But the "problem" is that it solves only part of the problem.  It
does translation of entities and numerical character references, but
translation of raw bytes in a charset different from ISO-8859-1 is
still not done correctly.  IOW you have generalized LYUnEscapeEntities,
but LYExpandString also needs to be generalized.  Those two function
are nearly always used together in HTML.c, like for example

            if (current_char_set)
                LYExpandString(&temp);
            /*
             *  Convert any HTML entities or decimal escaping. - FM
             */
            LYUnEscapeEntities(temp, TRUE, FALSE);

(and the 'if (current_char_set)' criterium isn't really valid any
more).  So my idea is to fold them into one function which would to
all the required translations of a string, entities and NCRs and raw
bytes; and (ideally) for all possible combinations of 'from' and 'to'
charsets.  I started writing a function for that (well, the
LYExpandString-corresponding part), but haven't finished or tested it.
It is not trivial since there can be a lot of different cases for the
kind of 'from' and 'to' encodings.

Can you remind me why we are doing all this from HTML.c, instead of in
SGML.c?  I keep coming up with reasons and then forgetting or
discarding them.  I think hidden (and other?) form fields are part of
it, they should go untranslated.  (Which brings up another area that
still should be dealt with better, labelling and/or translation of
form submissions.)

TEXTAREA is a strange case, I think character entities there should be
parsed already in SGML.c.  One can mess up the display and editing in
interesting ways by including &#13; or &#10; in it.  (Of course it is
unlikely that someone would do that by mistake.)


       Klaus

;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]