[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: LYNX-DEV error recovery for form parsing
From: |
Klaus Weide |
Subject: |
Re: LYNX-DEV error recovery for form parsing |
Date: |
Sat, 5 Apr 1997 22:40:59 -0600 (CST) |
On Sat, 5 Apr 1997, Foteos Macrides wrote:
> Hynek Med <address@hidden> wrote:
> >
> >Laura,
> >
> >this is funny. A while ago I sent to lynx-dev a similar patch (though it
> >in fact didn't work as I intended it to do, as Klaus has noted :-).. Our
> >idea is the same, just not to assume </FORM> and rather ignore the
> >offending ending tag.
> >
> >I wonder what do others think about the idea behind our patches.. It
> >certainly helps for most of the pages with bad markup and it doesn't have
> >any side effects on pages with good HTML..
>
> None of the currently active developers has addressed this,
An indication of skepsis combined with don't-understand-enough-of-this
(speaking just for myself of course). Well I ma trying to understand
better.
> except for a worrisome nonsequatur that HTML element handling might
> be done homologously to the optional SGML comment parsing, and Laura
> is still hacking solely in SGML.c without understanding the consequences
> for HTML.c, GridText.c, and LYfoo.c modules, so (against my better
> judgment 8-) I'll address it from my "vacation spot".
>
> When I was an active developer, this was an FAQ which I
> frequently answered, and Subir has an explanation in the "Why
> does Lynx do this" pages at "lynx links". Perhaps yet another
> explanation, but geared explicitly toward "code modifiers" rather
> than toward "general readers", would be helpful.
It was helpful.
> The current Lynx API uses ***TWO** stack-based parsers, one in
> SGML.c, and another in HTML.c. The one in HTML.c stacks "container"
> HTML elements (ones not declared SGML_EMPTY in HTMLDTD.c), and depends
> on the SGML.c parser to enforce valid (*strictly* embedded and *never*
> interdigitated) nesting of them. That is why the SGML.c functions
> substitute the "expected" end tags for "container" HTML elements before
> invoking HTML.c functions. If you break that, as in your patch, in
> Laura's original patch, and in her more recent "BETTER SOLUTION"
> patch, you throw the HTML.c stack out of whack.
Although I think this was not the case with Hynek's patch - if it had
worked the way he intended.
An example he gave was
<B><A HREF="something"></B>something</A>
Regular Lynx SGML.c processing would treat that as (== pass it down to HTML.c
as if it were)
<B><A HREF="something"></A>something</B>
giving a link that cannot be selected.
With Hynek's patch instead:
<B><A HREF="something">something</A>
The </B> is ignored (the SGML.c parser's stack is not changed when
</B> is encountered), and when the </A> is detected B is still on the
stack (possibly until the end of the document). But at least this
doesn't create out-of-order calls to HTML_start_element/HTML_end_element.
With Laura's "BETTER SOLUTION" patch (the first one was specific to FORM,
but I think the principle was the same):
<B><A HREF="something"></B>something</A>
I.e. generating calls to HTML_start_element/HTML_end_element in invalid
order. (Changing the order of stack elements, by using anything else than
push on pop operations, of course makes the whole idea of having a
stack structure pointless.)
From Laura's description:
"Strategy of fix: If and end tag </xxx> is found that doesn't match the top
element of the stack, search down the stack until you find a match. If
there's no match, ignore the end tag;"...
Isn't this *first* part reasonable? (just ignoring end tags that
cannot possibly be right.) It doesn't mess up the stacks (or so it seems
to me).
> In the course of
> the past three years, I've added lots of "hacks" to get around the
> constraints of stack-based parsing and try to cope with much of the
> bad HTML which the "anything that basically works and sells is fine"
> vendor(s) has(have) made so commonplace on the Web, so if you break
> the enforced valid nesting in SGML.c of HTML elements declared as
> "containers" in HTMLDTD.c and just test the result "empirically" with
> this or that URL that returns bad HTML, rather than understanding and
> considering the consequences for the "downstream" functions, you might
> think you've improved the situation. But believe me, please, that's
> NOT a good thing to do.
[...]
> When Rob started developing the configurable color/styles
> enhancements, and the potential for using external style sheets
> (very important, IMHO, for the long-term viability of Lynx) he
> also ran into the problem of stack-based parsing being heavily
> dependent on valid HTML, plus conflicts with my hacks to get around
> the constraints. He then turned to a hash table design, with the
> prospects of eliminating stack-based parsing in Lynx altogether.
> That, rather than further "workaround" hacks to the present
> stack-based parsing, is a better long-term objective for Lynx
> development (sez I from my "vacation spot" 8-).
I'd like to see it...
> Be that as it may, appended is a patch set for v2.7.1 which
> achieves what you and Laura are attempting, and without throwing
> the HTML.c stack out of whack. It is also available (as a
> formhack.patch text file and in a formhack.zip) in:
>
> http://www.slcc.edu/lynx/fote/patches/
> or: ftp://www.slcc.edu/pub/lynx/fote/patches
>
Another step in making Lynx's parsing more like that of the abovementioned
vendor's(s') products, unfortunately.
Klaus
;
; To UNSUBSCRIBE: Send a mail message to address@hidden
; with "unsubscribe lynx-dev" (without the
; quotation marks) on a line by itself.
;
- LYNX-DEV error recovery for form parsing, Laura Eaves, 1997/04/03
- Re: LYNX-DEV error recovery for form parsing, Foteos Macrides, 1997/04/05
- Re: LYNX-DEV error recovery for form parsing,
Klaus Weide <=
- Re: LYNX-DEV error recovery for form parsing, Laura Eaves, 1997/04/07
- Re: LYNX-DEV error recovery for form parsing, Foteos Macrides, 1997/04/07
- Re: LYNX-DEV error recovery for form parsing, Foteos Macrides, 1997/04/07
- Re: LYNX-DEV error recovery for form parsing, Foteos Macrides, 1997/04/08
- Re: LYNX-DEV error recovery for form parsing, Laura Eaves, 1997/04/08
- Re: LYNX-DEV error recovery for form parsing, Foteos Macrides, 1997/04/08
- Re: LYNX-DEV error recovery for form parsing, Laura Eaves, 1997/04/08
- Re: LYNX-DEV error recovery for form parsing, Foteos Macrides, 1997/04/08