lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lynx-dev Re: msg00798.html (was: 0x2276 handling)


From: Leonid Pauzner
Subject: lynx-dev Re: msg00798.html (was: 0x2276 handling)
Date: Thu, 30 Apr 1998 22:11:04 +0400 (MSD)

> but is a dev5 build according to Wayne.  The problem is in the handling
> of attribute values via the (excessively hairy and unmaintainable :)
> functions in v2.8's LYCharUtils.c and it's UCfoo.c mods, that I did
> not use (with lengthy explanations to lynx-dev of why) in the code set
> that I had released as v2.7.2.  The homologous functions in SGML.c and
> HTPlain.c handle other conversions.  They are not coordinated in the
> v2.8 code with each other and the attribute handlers in LYCharUtils.c
> (Although I had coordinated them in the v2.7.2 release, the v2.8 release
> "superseded" v2.7.2 without having dealt with these and other problems
> in the devel code set.).  You see different problems in v2.8 depending
> on the markup, and in turn whether you are using SGML.c, HTPlain.c or
> LYCharUtils.c functions to set up the chartrans conversions. To see

1) As I understand, HTPlain.c intentionaly do not convert any escaping
and named/numeric entities but 8bit text only (and something for CJK, maybe).

2) Yes, there is a great mess in LYCharUtils.c (namely LYUCFullyTranslate...).
URL hex escaping should be splitted out from attributes value translation
(like it was done in 2.7.2), even more: instead of coordinating
SGML.c, HTPlain.c and LYCharUtils.c they should simply call the same function
for chartrans (with the exception of line wrapping background in attributes,
maybe).
BTW, just for this letter I have prepared a variant of sgml.html with entities
moved inside alt= attributes: I got _exactly the same_ result except
-0x200D    ‍       HTMLspecial       # ZERO WIDTH JOINER
-0x200E    ‎       HTMLspecial       # LEFT-TO-RIGHT MARK
-0x200F    ‏       HTMLspecial       # RIGHT-TO-LEFT MARK
+0x200D                HTMLspecial       # ZERO WIDTH JOINER
+0x200E                HTMLspecial       # LEFT-TO-RIGHT MARK
+0x200F                HTMLspecial       # RIGHT-TO-LEFT MARK
There is no problem here.
Anyway, URLs escaping should be tested/rewritten someday.


Unfortunately, 2 months ago, when I was cleaning up chartrans a little,
you were "not available" on the list. I started with moving old
entities staff to unicode_entities (formely `extra_entities')
but found lot of places like reverse translation from isolatin1 entry
to entity name I had no idea for what (I was not sure in CJK).
So the changes was minimal.

Can you explain why we use `name = HTMLGetEntityName(value);'
for some HTPassEightBitRaw/HTPassEightBitNum
HTPassHighCtrlRaw/HTPassHighCtrlNum combination
instead of direct use of `LYlowest_eightbit' and `LYHaveCJKcharacterSet'?
Chartrans staff cannot be rewritten without understanding lots of such
questions, IMHO.


> the problem we've been discussing, you should have used Alex Matulich's
> test page (the URL was posted by Doug), and what his script returned
> before he modified his stuff to treat ';' instead of just '&' as the
> name=value separator (as in the HTML 4.0 recommendations, which he
> obviously has now read and understood :).
>
>
> >Yes, 0x2276 is not known for def7_uni.tbl currently, we may easily add
> >U+2276:<>
> >or something like this, if necessary.
> >
> >From the other hand, there are still few strange characters like 0x200A
> >which are _known_ by def7_uni.tbl but report error handling
> >instead of promised substitution. This is a bug.
>
>         It was inappropriate to have defined any SGML named character
> references to Unicode values without also setting up default chartrans
> conversions for them (looks like there are more than just "lg').
def7_uni expanded on "help yourself" basis.
> Depending on which of the (uncoordinated in v2.8) functions of SGML.c,
> HTPlain.c or LYCharUtils.c is invoked (based on the markup and MIME
> type), this has created a situation in v2.8 for which strings/Unicode
> values are being passed as "known" to functions which in fact don't
> know them as SGML character references, and particularly for that
> mess in the v2.8's LYCharUtils.c, have no rational error recovery
> associated with them.
Correct. Will look if this really happened or only possible.
>
>         Also, note this problem brought out for v2.8 by Alex's test
> page:  Had the "lg" in fact been handled according to SGML principles
> as a character in the URL with a value greater than decimal 127, and
> the markup actually intended that (e.g., for an i18n path), on
> submitting it to the http server the v2.8 code is still using Klaus'
> obsolete conversion function, instead of converting it to utf-8 and
> then hex escaping each byte of the resultant multibyte character, as
> is done in such cases by the code I had released as v2.7.2.  So even
> if the chartrans stuff in v2.8 is fixed up, such URLs would still fail
> to retrieve the resource for Lynx users (the server or its script would
> have no way to back translate correctly).  I had posted lengthy messages
> about this before the v2.8 release, but... (What a "pickle" this is!
> I retired just in the knick of time. :)
>
>                                         Fote
> --
> Foteos Macrides (address@hidden during April, '98)
                                       ^^^^^^^^^^^^^^^^^^


reply via email to

[Prev in Thread] Current Thread [Next in Thread]