lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV pre-announcing a new Lynx SGML.c parser


From: Klaus Weide
Subject: Re: LYNX-DEV pre-announcing a new Lynx SGML.c parser
Date: Tue, 22 Apr 1997 00:16:10 -0500 (CDT)

On Mon, 21 Apr 1997, Foteos Macrides wrote:
[kw:]
> > Fote, I would appreciate your help here :).  It would help if you at
> >least did not make changes to HTML.c that depend on new hacks introduced
> >in SGML.c and the HTMLDTD.  (I am not saying that you *did* make such 
> >changes recently; this is just a just-in-case request, I still haven't
> >checked whether the recent me->inUnderline changes fall in this
> >category.  Your clarification sounded a bit like it, but I am not sure
> >so will have to check the code.)
> 
>       I don't know the date of the FOTEMODS code to which you are
> referring.  The last time I changed HTMLDTD.c with corresponding
> mods in HTML.c was on the 18th, when I added tags for soft
> hyphenation, and support for the WBR Netscapism.  Those are simple
> mods (within the context of the current code, which already implements
> truly soft hyphenation), and the tags are, and should be, SGML_EMPTY.

I was still referring to the changes (approximately?) at the same time
when you made A, I, and a bunch of other tags SGML_EMPTY, including the
me->inUnderline changes.  Anyway, I went through those changes now, and
they seem to be "harmless" for my purposes.

> > (Also, I know and accept that you don't want to be considered an
> >"active developer" at this point. [...]

>       I can just do bug fixes, if I encounter any or people report
> them, for the lynx2-7-1+FOTEMODS as it presently stands, until you've
> assessed your parsing strategy.  

I am not asking you to stop doing more experimental things (of course!),
I assume that is more fun to do also for you (or may give you some of the
fun back :) ).  I was just saying that it would help [me :)] if you didn't
make HTML.c depend more on specific things in SGML.c and HTMLDTD.c. 
As it turns out, you didn't do anything like that, so my request was
pointless anyway.  (One such added dependency was when, in the initial
version of the "FORMS hack", HTML_end_element() tested for the specific
element rather than gennerally for SGML_EMPTY.  But that was changed
quickly anyway.)

Some other existing dependencies of HTML.c on SGML.c hacks seem to be
that there is no handling code for </P>, and SELECT's don't seem to work
if OPTION is SGML_MIXED and there are explicit </OPTION>'s.  (It should
not be EMPTY, according to the DTDs... And yes, there are probably very
good reasons why HTML.c does it the way it does.)  

> It's not obvious to me how well
> it can dealing with the "tag and attribute soup" handling of non-HTML
> on the Web as it's become, at least based on the strategy as you've
> described it, but you may as well enjoy yourself trying it, and decide
                                    ^^^^^
> for yourself whether it's a promising approach.

Well, yes.  My motivation is not foremost that I believe this is a better
approach than yours (or any other).  I was curious, so I had to try it.
Also trying to reduce things to something I can understand better...
If it truns out to be useful, that's an extra benefit.

> [...] I am not sure whether there is any screwed-up HTML out there where
> >my approach *already* gives better results, or whether it finally can be
> >made sophisticated enough to generally improve treatment of bad HTML (over
> >that already done by Fote's latest hacks).  Maybe a combination of 
> >approaches will finally give best results.
> 
>       Well, for example, that recently posted bad HTML with both
> explicitly and functionally interdigited "container" elements is
> rendered and displayed by lynx2-7-1+FOTEMODS exactly as Christian
> intended.  How is it handled with your mods?  

It is shown as a list (as Christian intended) until the end, but
hightlighting is not shown at all (as Christian probably not intended),
since UL cannot be contained in B...  Well too bad, but isn't there
too much frivolous highlighting anyway?  :)

More seriously, it is rather clear that your approach gives superior
results if the intention is to treat "highlighting tags" totally
separately from paragraph (and other) structure elements - because
that is exactly what it does.  It will not recover from "One Opening
[highlighting] Tag Too Many" throughout the rest of the file (where my
mods can "recover" from that rather abruptly.)  But then those errors 
may be immediately obvious in Netscape etc., so people would tend to 
not make them...

> How about that awful businesswire page? 

It was doing that one ok, very similar (maybe identical appearance) to
your code.

Here is another page of *really* awful "HTML" that you may want to
test against: http://www.ludd.luth.se/~max/ and the computers.html
links in that.  (Recently came up in a crash report.) My code may 
actually give better results there, although it is really hard to tell
what the author intended there...

> But even if, per chance, such non-HTML should
> work better with my hacks, I must admit I have very mixed feelings
> about making Lynx that much of a tag and attribute soup non-HTML
> handler.  

So you should understand my motivation perfectly :)

Also what you did for that last batch of tags may not be (easily)
applicable to some other tags (as you have explained).  I was although
thinking of C. Maden's ideas (a while ago now) in connection with 
XML.  And of stylesheets, although I really don't know whether they
are helped by any of this. [Still haven't looked at newer code from Rob.
No, not because it's awfully written Rob, I just haven't done it.]

Anyway, my stuff might be useful or not, and finally a combination
with your mods may give something even better.  Or not.  I just
feel I shouldn't be the only one looking at awful webpages to test
it, so I try to convince others to do that...

> So, again, feel free to ignore my mods in any devel code
> intended for an eventual "formal" release.

I am not thinking much about an eventual "formal" release, and maybe
I won't have anything to do with it except for some entries in a 
CHANGES file.  Just playing around.

  Klaus




;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]