lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev SortaSGML parsing and missing *elements*


From: Klaus Weide
Subject: Re: lynx-dev SortaSGML parsing and missing *elements*
Date: Wed, 1 Sep 1999 07:15:48 -0500 (CDT)

On Wed, 1 Sep 1999 address@hidden wrote:

> I was playing with Lynx yesterday and noticed that even if I have
> SortaSGML parsing turned on, the following HTML doesn't generate
> any calling of the HTML_BODY code in start_element.
> 
> -----
> <title>Minimal</title>
> Hello
> -----
> 
> I was under the impression that SortaSGML was written to handle
> this kind of thing.  Does it not insert "missing" tags onto the
> stack in order to get to a consistent state?  

There are many "kinds of things" that a real SGML parser does and
that Sorta doesn't do.

The stacks (both in SGML.c and HTML.c) don't really stack tags (despite
what the variables may named like) but elements.  Closing "tags" can by
suppplied by our "SGML parser" in order to push complete elements.
Consistency to that degree is there.  Consistency with a full SGML
content model (which doesn't even exist as data anywhere in the code), no.

Besides, Sorta isn't really anything that was "written" new.  It was
not a radical departure from what was there before. In a way, it was
reverting back some changes that TagSoup had made (treating more elements
as SGML_EMPTY), and adding some recovery smarts in SGML.c, so that HTML.c
might not need to be tweaked so much.  It doesn't add any really new
approach over what was there in, say, lynx 2-6.

> If not, are there any plans for it to do so?

Once I've been thinking of generating elements that should be there
but need not be given (start tag AND end tag can be omitted).
But this would require additional lists for when-to-generate-what,
and checking everytime a normal character passes through whether
this is the time to generate the missing element.  Or something like
that.  I don't see the benefits, given that parsing is only "sorta"
in so many other ways.

> Having a full tag stack is necessary for several things, like
> CSS, DHTML, etc.

If you need something approaching real SGML parsing to that degree,
somebody has to write it.  Sorta isn't it.

> Should we not be thinking about integrating
> the latest version of libwww into Lynx?

This sounds like a different topic altogether.

Or does the latest libwww do what you want?  I don't know, but
last I checked, which was a while ago, it didn't either, their
SGML.c didn't seem *that* different in principle from Lynx's.
All that may have changed since then of course.

    Klaus


reply via email to

[Prev in Thread] Current Thread [Next in Thread]