lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV lynx2.7.1


From: David Woolley
Subject: Re: LYNX-DEV lynx2.7.1
Date: Sat, 06 Sep 1997 15:54:29 -0400

> 
> Is there anyone who could take time from their busy schedule
> to pass along a list (if any such list exists) of supported tags?

That's in some ways easy, and in some ways quite difficult.  Lynx supports
something approaching the union of all the tags supported by current
browsers, however, the level of support varies from just recognizing them
to acting as a naive user might expect.

E.g. <FRAME> is recognized, but generates a link, not a frame.  <TD>
is recognized, but only to the extent that </TD> generates a space,
similarly </TR> forces a new line.  Some are probably only handled to
the extent needed to ensure reasonable error recovery from common HTML
faults (most HTML is broken!).

> Planned future tag support? Whatever? A pointer to an online source
> would be sufficient.

The file WWW/Library/Implementation/HTMLDTD.c in the source distribution
(see www.lynx.browser.org for the location) contains all the tags that
are known to Lynx, but I suspect that only a couple of people, and maybe
only if you combine their knowledge, can tell you exactly which produce
a significant effect on the output.

I suspect you may find that Lynx provides a master check list, rather
than just another subset!  There are 112 tags known to Lynx 2.7 and
many more attributes.

If you want to brave the source code, src/HTML.c contains the actual
handling rules for tags, from which you may be able to get an idea how
they are dealt with, from the comments.

One thing you need to realise is that a browser is more than the tags
it supports.  HTML is a structured language.  Many of the commercial
browsers ignore that structure and process tags out of context (it has
been called "tag soup" - the result can be interesting if you start
nesting structures - they sometimes come out of the nest too soon.
Lynx starts off on the basis that HTML is structured, then compromises
this so that it doesn't fault on the broken HTML that people (and even
automated tools - particular HTML mail programs, but even FrontPage 97
seems to like leaving in <B></B> etc.) create and get away with on the
commercial browsers, and so that it produces the expected results for
the common errors, even if that compromises the correct behaviour.

(Even a recently proposed HTML version of the Lynx documentation contained
the illegal tag sequence:  <DL COMPACT><P>.)

Lynx also has a degree of foreign language support which you may not find
in any commercial product, and again there have to be hacks to compensate
for broken HTML, e.g. many pages generated with FrontPage are in the
Windows Code Page 1252 character set, but claim to be in ISO 8859/1, the
default character set.  If you see lynx displaying the entity &#1; literally,
it is because it can't know what it means, because it isn't an ISO 8859/1
or Unicode character, rather than it because it doesn't know all the
entity codes.

;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;


;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]