lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lynx-dev] A patch for lynx.


From: Zephaniah E. Hull
Subject: Re: [Lynx-dev] A patch for lynx.
Date: Fri, 20 Apr 2007 16:45:18 -0400
User-agent: Mutt/1.5.13 (2006-08-11)

On Fri, Apr 20, 2007 at 07:25:42AM +0100, David Woolley wrote:
> Zephaniah E. Hull wrote:
> 
> >To verify here, is the / actually legal in any version of HTML or XHTML
> >that does not consider this a closing statement?
> 
> It has a meaning in SGML, I can't remember if the relevant
> option is turned off for HTML, but, in general, it is only
> generic tools that recognize it properly as real life HTML
> parsers aren't SGML compliant and it is this non-compliance that
> the Appendix C rules rely on.
> 
> If I remember correctly, correct usage would be:
> 
> <title/This is the title/
> 
> I think it may be an option that is turned off in the SGML
> specification file for HTML.  The ">" would be treated as
> a normal character.

Actually, we accept it fine in HTML.

Further, due to the same code we have some fairly inconsistent handling
of / in tags at the moment.

'<tag/>' generates '<tag></tag>', '<tag/foo/' is accepted, '<tag foo />'
is accepted, but does not generate a closing.

> My view is that the only legitimate reason for handling XML
> syntax is that you are implementing XHTML properly.  That means
> - offereing to accept application/xhtml+xml;
> - only applying the rules if the contents is actually labelled
>   as such;

Which lynx does not currently do for any other case, including the
SGML/HTML stuff.

> - using the XML default character set and requiring XML processing
>   instructions to override it;
> - aborting the document on the first well formedness error (incidentally
>   much of the XML type syntax in web pages would result in this
>   behaviour if served in non-compatibiity mode!).

I'd argue that the specification itself is being stupid here.

A web browser does not exist to do exact validation of a given standard
or set of standards.

A web browser exists to give the best experience to the user, this is
usually done in this case by following the appropriate standard, but not
always.

In the end, the best approach, IMHO, is the one that makes the most
pages render as close to accurately as possible given the restrictions
of of the userinterface and usage of lynx.

Clearly, others working inside the parser in question have agreed with
this approach, allowing for things like tag soup mode, SGML tags, and
compatibility with netscape parsing _bugs_.

_All_ browsers targeted at actual people doing web browsing target this
way, and it is flatly _stupid_ to abort on a page that your parser can
still make something of.  It's fine to say that you're _allowed_ to
abort if the page is not well-formed, but it's absolutely stupid to
actually do so if your parser can deal with the fact that it's not well
formed.

> 
> As Lynx doesn't support scripting or style sheets, the required use
> of CDATA sections for inline ones may not be such an issue and
> the document object model changes (e.g. missing TBODY is really
> missing) may not matter.
> 
> Microsoft have said that this was too difficult to do in IE7 and
> have decided not to do it, rather than do it and get it wrong.  (One of 
> the arguments against the Appendix C concession is that it will become 
> impossible for browsers to comply with the reject not-well formed 
> content rule  because so much XHTML is only being used on IE and is 
> therefore people are producing a lot of it with well-formedness errors.)
> 
> If other browsers are making a last gasp effort to accept even more 
> invalid content before they start accepting XHTML and therefore 
> rejecting on well-formedness errors, they are being very cynical.  If it 
> is possible to do error recovery in Lynx that doesn't follow this path,]
> It will be much better.
> 
> As a first thing, one should do a much more careful bugwise 
> compatibility reverse engineering job to see whether, for
> example, a missing </script> is handled correctly even when
> <script src=.... /> is not used.
> 
> I'll also repeat that the only excuse that many people give for
> serving XHTML asd text/html is that the XHTML syntax rules are
> tighter (they aren't - HTML just has more convenience rules).
> As such, if they then use it wrongly, they should receive complaints.

Sadly, while some websites are fairly responsive (one case of the
<script src=... /> has been fixed), the vast majority are just going to
ignore complaints, or respond with 'it works with IE and Firefox'.

This means that, at least for the cases where they work without error,
we can either choose not to support those pages or we can try to fix it.

Zephaniah E. Hull.

-- 
          1024D/E65A7801 Zephaniah E. Hull <address@hidden>
           92ED 94E4 B1E6 3624 226D  5727 4453 008B E65A 7801
            CCs of replies from mailing lists are requested.

   "<green>From</yellow>"
   "Wow. The green word From is no longer yellow. That's deep, man."
 -- Marcus Meissner & Lars Balker Rasmussen in the Scary Devil Monastery

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]