lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

LYNX-DEV error recovery for form parsing


From: Laura Eaves
Subject: LYNX-DEV error recovery for form parsing
Date: Thu, 3 Apr 1997 10:40:36 -0500 (EST)

I sent mail a while back about all the web pages I've run into with the problem
lynx has parsing files with bad html and forms -- closing forms early if the
<FORM> </FORM> pairs aren't matched on the parsing stack.  This is apparently
a known problem, but I've run into it so much that I decided to take a crack at
fixing it.  (I've run into AT LEAST 6 websites with this problem.  I always send
mail to the webmaster describing the problem.  So far, only 2 have actually
fixed their web pages.)

Anyway, below is a relatively small change to WWW/Library/Implementation/SGML.c
from 
        Lynx Version 2.7ac-0.38 (1997)
It seems to fix the problem.

I tested the fix on various URLs, including the following:

http://www.netmarket.com/sa/pages/comments/sid=oZ7uEHgV6b&text=on
http://www.lycos.com/customsearch.html
http://www.voicenet.com/~leaves/file.html
        this was the qvc page that they have since fixed
http://www.egghead.com/  select "Shopping Cart"
        the fixed lynx complains about bad html, but the form submit still works
        Only one diff: the appearance of the page is slightly different then
        when I use lynx 2.7.

)There was oneother online dictionary that had this problem, but I can't find
the URL...)

Anyway, here is the diff.
If you feel it is safe, feel free to add it to the official source.
I think it is important as there a a lot of bad pages out there that have
this problem.

Thanks.
--le

*** old/SGML.c  Thu Apr  3 07:31:36 1997
--- WWW/Library/Implementation/SGML.c   Thu Apr  3 07:29:05 1997
***************
*** 521,526 ****
--- 521,527 ----
                            old_tag->name);
        return;
      }
+ again:
  #ifdef WIND_DOWN_STACK
      while (context->element_stack) { /* Loop is error path only */
  #else
***************
*** 528,552 ****
  #endif /* WIND_DOWN_STACK */
        HTElement * N = context->element_stack;
        HTTag * t = N->tag;
        
        if (old_tag != t) {             /* Mismatch: syntax error */
            if (context->element_stack->next) { /* This is not the last level */
!               if (TRACE) fprintf(stderr,
                "SGML: Found </%s> when expecting </%s>. </%s> assumed.\n",
                    old_tag->name, t->name, t->name);
            } else {                    /* last level */
                if (TRACE) fprintf(stderr,
                    "SGML: Found </%s> when expecting </%s>. </%s> Ignored.\n",
                    old_tag->name, t->name, old_tag->name);
                return;                 /* Ignore */
            }
-           
        }
        
        context->element_stack = N->next;               /* Remove from stack */
!       FREE(N);
!       (*context->actions->end_element)(context->target,
                 t - context->dtd->tags, (char **)&context->include);
  #ifdef WIND_DOWN_STACK
        if (old_tag == t)
            return;  /* Correct sequence */
--- 529,572 ----
  #endif /* WIND_DOWN_STACK */
        HTElement * N = context->element_stack;
        HTTag * t = N->tag;
+       int end_form_error = 0;
        
        if (old_tag != t) {             /* Mismatch: syntax error */
+           if ( toupper(t->name[0]) == 'F'
+           &&   toupper(t->name[1]) == 'O'
+           &&   toupper(t->name[2]) == 'R'
+           &&   toupper(t->name[3]) == 'M'
+           &&   toupper(t->name[4]) == '\0' )
+               end_form_error = 1;
            if (context->element_stack->next) { /* This is not the last level */
!               if (TRACE) {
!                   if ( end_form_error )
!                       fprintf(stderr,
!               "SGML: Found </%s> when expecting </%s>. </%s> hoisted.\n",
!                   old_tag->name, t->name, t->name);
!                   else
!                       fprintf(stderr,
                "SGML: Found </%s> when expecting </%s>. </%s> assumed.\n",
                    old_tag->name, t->name, t->name);
+               }
            } else {                    /* last level */
                if (TRACE) fprintf(stderr,
                    "SGML: Found </%s> when expecting </%s>. </%s> Ignored.\n",
                    old_tag->name, t->name, old_tag->name);
                return;                 /* Ignore */
            }
        }
        
        context->element_stack = N->next;               /* Remove from stack */
!       if ( !end_form_error ) {
!           FREE(N);
!           (*context->actions->end_element)(context->target,
                 t - context->dtd->tags, (char **)&context->include);
+       } else {
+           N->next = N->next->next;
+           context->element_stack->next = N;
+           goto again;
+       }
  #ifdef WIND_DOWN_STACK
        if (old_tag == t)
            return;  /* Correct sequence */
;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]