lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV error recovery for form parsing


From: Laura Eaves
Subject: Re: LYNX-DEV error recovery for form parsing
Date: Mon, 7 Apr 1997 08:06:09 -0400 (EDT)

> From address@hidden Sat Apr  5 17:36:17 1997
>...
>       The current Lynx API uses ***TWO** stack-based parsers, one in
> SGML.c, and another in HTML.c.  The one in HTML.c stacks "container"
> HTML elements (ones not declared SGML_EMPTY in HTMLDTD.c), and depends
> on the SGML.c parser to enforce valid (*strictly* embedded and *never*
> interdigitated) nesting of them.  That is why the SGML.c functions
> substitute the "expected" end tags for "container" HTML elements before
> invoking HTML.c functions.  If you break that, as in your patch, in
> Laura's original patch, and in her more recent "BETTER SOLUTION"
> patch, you throw the HTML.c stack out of whack.  In the course of
> the past three years, I've added lots of "hacks" to get around the
> constraints of stack-based parsing and try to cope with much of the
> bad HTML which the "anything that basically works and sells is fine"
> vendor(s) has(have) made so commonplace on the Web, so if you break
> the enforced valid nesting in SGML.c of HTML elements declared as
> "containers" in HTMLDTD.c and just test the result "empirically" with
> this or that URL that returns bad HTML, rather than understanding and
> considering the consequences for the "downstream" functions, you might
> think you've improved the situation.  But believe me, please, that's
> NOT a good thing to do.
>...
>       Be that as it may, appended is a patch set for v2.7.1 which
> achieves what you and Laura are attempting, and without throwing
> the HTML.c stack out of whack. ...
>...
>                               Fote

I incorporated Fote's patch into my copy of 2-7-1 and it seems to wrok on all
the broken URLs.

Thank you for taking the time to make this fix (which I think is a very
important fix affecting a lot of pages) and for the info about the 2 parsers...
I took a look at HTML.c and GridText.c and have a better picture of the
interdependence.  (Amazing how the SGML hack seemed to work so well on so
many pages...:-)

Anyway, looking at HTML_end_element(), I did notice some minor oversights
and made a couple of 1-line changes that I think could be added to your
patch....  Diff included below for your  perusal.

I still haven't given up on the possibility of tweaking the parser(s)
somehow (in a "safe" way, of course) to improve recovery from some
of the more common syntax errors.  If I  have time, I may try a couple of ideas.
(Parsing is a specialty of mine from back when I worked on C++ compilers...)
But I have no intention of reworking anything, especially if someone else
is looking at it.  My main purpose in doing this at all was to fix the forms
bug...

Thanks again.
--le

PS: Majordomo has fixed the email problem, so I am now getting mail from
    lynx-dev...  There is no need to copy me on replies.

*** fm/HTML.c   Mon Apr  7 07:08:20 1997
--- src/HTML.c  Mon Apr  7 07:30:51 1997
***************
*** 4286,4292 ****
      char *temp = NULL, *cp = NULL;
  
  #ifdef CAREFUL                        /* parser assumed to produce good 
nesting */
!     if (element_number != me->sp[0].tag_number) {
          fprintf(stderr, 
                "HTMLText: end of element %s when expecting end of %s\n",
                HTML_dtd.tags[element_number].name,
--- 4286,4292 ----
      char *temp = NULL, *cp = NULL;
  
  #ifdef CAREFUL                        /* parser assumed to produce good 
nesting */
!     if (element_number != me->sp[0].tag_number && element_number != 
HTML_FORM) {
          fprintf(stderr, 
                "HTMLText: end of element %s when expecting end of %s\n",
                HTML_dtd.tags[element_number].name,
***************
*** 4320,4326 ****
                fprintf(stderr,
    "Stack underflow error!  Tried to pop off more styles than exist in 
stack\n");
        }
!     }
      
      /*
       *  Check for unclosed TEXTAREA. - FM
--- 4320,4326 ----
                fprintf(stderr,
    "Stack underflow error!  Tried to pop off more styles than exist in 
stack\n");
        }
!     } else if ( TRACE ) fprintf(stderr,"HTML:end_element: end form\n");
      
      /*
       *  Check for unclosed TEXTAREA. - FM
;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]