Re: lynx-dev Re: Another TagSoup problem

From: Henry Nelson
Subject: Re: lynx-dev Re: Another TagSoup problem
Date: Tue, 3 Feb 2004 22:44:35 +0900
User-agent: Mutt/1.5.4i-ja.1

On Tue, Feb 03, 2004 at 03:55:49AM -0800, Ilya Zakharevich wrote:
> Quite the contrary.  So called "strict" (actually "make a mess without
> any feedback to the user about the reason") parsing is the default.

Only because you have not edited lynx.cfg to get the style of parsing
you want.  If you read the distribution lynx.cfg, you will find the
reasons why the parsing choices were made.  However, if you really want
to know the reason why a page shows up as a "mess", I recommend that you
use "\" to review the source.  I might add that it is not Lynx that is
"mak[ing] a mess" in the samples of html you have presented.

> > The user chooses which one Lynx should employ.
> Do you mean that lynx will not render anything until the user makes
> her choice?

No.  I mean edit lynx.cfg to read "TAGSOUP:TRUE".

If you want to use it during a session, and not all the time, then use
the toggle or the O)ptions page.

> And do you mean that lynx docs provide any grokable
> feedback about what are the consequences of making this choice?

Quoting from lynx.cfg:
# If TAGSOUP is set, Lynx uses the "Tag Soup DTD" rather than "SortaSGML".
# The two approaches differ by the style of error detection and recovery.
# Tag Soup DTD allows for improperly nested tags; SortaSGML is stricter.

That is "grokable" to me.  Lynx.cfg is huge enough without adding lengthy
dissertations about all the control mechanisms available.

> > I would expect TagSoup mode to at least make an attempt at rendering
> > sloppy html, but SortaSGML mode should show it for the junk it is.
> OK, how are you going to show that "something is a junk"?  How the
> user is going to distinguish the "junk on screen" due to deficiencies
> of lynx and "junk on screen" due to deficiencies of HTML?

`` \ '' is your friend.

If that is too much trouble for you, then run the page in question through
a validator.

If you don't care about either the html of the page or about the ability
of Lynx to give you a choice of parsing styles, then I think you need to
change your browser of choice.

If you come up with a document with valid html, and Lynx cannot render it
intelligibly, then please do fix the problem in Lynx.  If you come up with
a document with invalid (within reason) html, and Lynx cannot render it
intelligibly when in relaxed TagSoup mode, then suggest a fix.  Not all
users of Lynx may agree with you, however, that a fix is required.

The stricter SortaSGML mode makes the right choices most of the time.
Please don't muck with that style of parsing.  It's one of the main reasons
I continue to use Lynx.  I suspect I am not the only one.  Error recovery
should go into the relaxed TagSoup mode.  That's what it's for.  Thanks.


