lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV Re: new Lynx SGML.c parser


From: Klaus Weide
Subject: Re: LYNX-DEV Re: new Lynx SGML.c parser
Date: Fri, 25 Apr 1997 16:54:24 -0500 (CDT)

On Fri, 25 Apr 1997, Christopher R. Maden wrote:

> [Klaus Weide]
> > Well I was thinking of you when I started this "new parser"[*]
> > project.  I remember you made the claim that a structured parser
> > with error recovery heuristics could improve handling of invalid
> > markup (or similar wording; I hope I didn't get your meaning too
> > wrong).
> 
> Hmm - did I say that?  Well, it depends on the class of error.
> Keeping a tree structure can help you realize when tags are
> mis-matched and tell you which ones probably need to be closed.  This
> will not match Netscape's behavior, which is to keep a couple of
> stacks; start-tags for elements of a class (like the font-changing
> class) will push formatting on the stack, and any end tag for that
> class will pop the stack.

That is basically what Fote's changes to the parsing since 2.7.1 do:
in terms of your description, there is a class for highlighting,
a class for FORMs, a class for the rest.

> OTOH, real SGML parsing can be a limitation for the stuff on the Web;
> our DynaBase Web Management system had real problems with <table
> border> as SGML.  (If you want to know why this sucks from an SGML
> point of view, ask me off-line.)  We had to add special cases for some
> HTML crap introduced by certain vendors.

You mention tables, but what about the different stacks?  Is real SGML
parsing a "limitiation for the stuff on the Web" there?

> > So there is now some way to test that claim...  This of course is
> > not doing real SGML parsing, just trying to resemble it a bit
> > better.  (Not that I really understand all the things a real SGML
> > parser is supposed to do...)
> 
> No one does. d-:  That's one of the reasons for XML - it *should* be
> possible to understand everything an XML parser is supposed to do.
[...]
 
> Sure.  What I had in mind may not work, but I was thinking of storing
> the whole parsed document in memory (or putting part of it on disk as
> virtual memory).  I was just thinking of our own made-up pointer
> structure, but I think that the Document Object Model would be a good
> way to do it.  This was talked about a lot at WWW6; the XML folks and
> the WAI folks are very excited about it.  It provides a standard
> interface to a parsed document; together with XML, it gives a lot of
> power.  This would not replace Lynx's HTML parser, but would provide a
> new internal MIME type for handling XML.

I still don't get where the "MIME type" part comes in.  (But maybe that's
because I don't know anything about DOM?  You may want to spread soem more
knowledge.)  But if this is about building a structure in memory, which
would just used by one application (Lynx), what's wrong with pointers and
linked lists etc. in C, what does it have to do with MIME (internal or
not).

  Klaus

;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]