lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV error recovery for form parsing


From: Klaus Weide
Subject: Re: LYNX-DEV error recovery for form parsing
Date: Mon, 7 Apr 1997 21:47:52 -0500 (CDT)

On Mon, 7 Apr 1997, Foteos Macrides wrote:

> Klaus Weide <address@hidden> wrote:
> >On Sat, 5 Apr 1997, Foteos Macrides wrote:
> >>[...]
> >>    The current Lynx API uses ***TWO** stack-based parsers, one in
> >> SGML.c, and another in HTML.c.  The one in HTML.c stacks "container"
> >> HTML elements (ones not declared SGML_EMPTY in HTMLDTD.c), and depends
> >> on the SGML.c parser to enforce valid (*strictly* embedded and *never*
> >> interdigitated) nesting of them.  That is why the SGML.c functions
> >> substitute the "expected" end tags for "container" HTML elements before
> >> invoking HTML.c functions.  If you break that, as in your patch, in
> >> Laura's original patch, and in her more recent "BETTER SOLUTION"
> >> patch, you throw the HTML.c stack out of whack.  

[...]
>       Guard against the trap of thinking that you can know reliably
> what was intended by bad HTML, and making mods based on particular
> cases encounted [...]

Just had to let that stand...

> [...].  But if it were:
> 
>       <TEXTAREA>...</TEXTARA>
> 
> the rest of the document would be treated as the TEXTAREA content.

Minor(?) point (not based on testing, but trying to understand the
SGML.c code):  it seems that for this example, the misspelled </TEXTARA>
*will* leave the TEXTAREA element open, since SGMLFindTag() fails to find
an element by that name and the call to end_element() is bypassed -
in accordance with RFC 1866 section 

4.2.1. Undeclared Markup Error Handling

   .....                    markup in the form of a start-tag or end-
   tag, whose generic identifier is not declared is mapped to nothing
   during tokenization. ....

[...]
> >Another step in making Lynx's parsing more like that of the abovementioned 
> >vendor's(s') products, unfortunately.
> 
>       This is directed to people who posted "We want reverse-engineering
> for Netscape with freedom to stay ignorant about valid HTML and URL
> syntax." retorts to Klaus' last sentence.
> 
>       Klaus is the only currently active developer who is not
> only a skilled programmer extensively knowledgeable about the Lynx
> code, but *ALSO* someone who is knowledgeable and keeps informed about
> the relevant RFCs and IDs out of an inherent interest in quality
> development of the Web as a whole.  

Let's hope that sentence is *not* true, especially the "only" part.

> Also, despite his initial
> statement of non-intention, he has largely accepted the mantle
> of de-facto coordinator. 

Whatever I do, nobody should rely on it, and I may stop doing it
tomorrow.  Others now have access to the same set of development code
files, thanks to Scott's offer to host all this on sol.slcc.edu.

Anyway, one just has to look at the size of the wishlist/TODO page
at <URL:http://sol.slcc.edu/lynx/html/todo-list.html>, in order to
understand that there's much more to do than what one person, or
even a handful of hackers, can do.  I certainly don't feel obliged,
or responsible, for reducing that list.

>       Everyone involved in "active" Lynx development (actually
> working on the code) is doing so as a "spare time hobby".  People
> pursue hobbies when they are fun, and stop if they become more
> a chore than fun.

True.
 
>       I got heavily involved with Lynx back when HTML 3.0 was
> viable, because is was fun implementing it's advanced features,
> with the combined challange of adapting them to a character cell
> client, ahead of the GUI pack.  It became progressively less fun,
> and more just a chore, as the name of the game progressively
> became reverse-engineering for Lynx users who want freedom to
> remain ignorant.
> 
>       Think twice about creating the same situation for Klaus.

I tend to ignore wishes / requests / demands that I don't feel like
dealing with.  Since access to the code is free, this doesn't prevent
anyone else from implementing whatever *they* feel necessary or useful.
What I can to is only a small part of all that could be done,
anyway (see todo-list URL etc.).


A remark closer to the original remark starting this, whether it is
"unfortunate" to make Lynx's parsing more like Netscape(et.al.)'s:
People should keep in mind that there are real costs involved in
"enhancing" Lynx's "robustness"[*] by way of more and more
special-case hacks which override the originally structured design.
Fote mentioned the increased difficulty in implementing new things
which require a structured parsing of HTML (e.g. stylesheets).
Another part of the price to pay is that the code becomes more and
more obscure, therefore more error prone and raising the initial
barrier for new would-be hackers/developers.  It's a self-limiting
process: finally Lynx development will come to a halt when there is
nobody left understanding the code. :-)

Now if you [anybody] have an idea how to make Lynx more
Netscape(et.al.) compatible (a.k.a. Un-HTML compatible) in a new,
structured way that doesn't bear those costs, and without sacrificing
correct treatment of valid markup, let's see it.

  Klaus

[*] essay about use of the word "robust" when comparing Lynx with other
    programs omitted, for brevity and mercy with the reader...


;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]