lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lynx-dev] A patch for lynx.


From: David Woolley
Subject: Re: [Lynx-dev] A patch for lynx.
Date: Thu, 19 Apr 2007 22:15:31 +0100
User-agent: Thunderbird 1.5.0.10 (X11/20070221)

Zephaniah E. Hull wrote:

of '<tag some_attributes />', which means fairly exactly
'<tag some_attributes></tag>'.

No it doesn't.  In XHTML 1.0 written in *Non*-HTML compatible
mode it has this meaning.  In HTML it has had a de facto meaning
of simply <tag some_attributes>.

Appendix C of the XHTML specification contains some guidelines that
will result in XHTML being more or less compatible with real life
HTML browsers.  They are controversial - most people using XHTML 1.0
should be using XHTML 4.01 and the rules are self contradictory. However, they rely, in part, on HTML browsers not handling / correctly, which has a rather different meaning, and, instead ignoring
the / as an illegal character.


Some valid examples of this are '<a name="chapter1" />', and much worse

This is a violation of the compatibility rules.  The /> notation is
only allowed when the content model is "empty". That's a pretty serious fault because one of the commonest claimed reasons for using XHTML is that the browser is supposed to throw it out if the tags don't balance properly, i.e. they claim to be using it in order to ensure that they are using valid code, but by violating Appendix C and then serving the document as HTML, they *are* violating the specification. (Only XML browser fed the document as XML will abort on badly formed documents. The main reasons for finding so much XHTML served as HTML is that IE doesn't support true XHTML and XML is fashionable.)

for lynx '<script type="text/javascript" src="dhtml.js" />', the latter
is especially bad because lynx simply won't render anything past it.
(And it in the header is absolute death for rendering the page.)

You may well find that what is really happening here is that the
mainstream browsers are detecting the src parameter and using that
to realise that a missing </script> is an error.  If they are
deliberately interpreting this as XHTML syntax, they are
being more cynical than I thought.  The compatibility rules were
designed to work with HTML browsers as they were; HTML browsers
were not intended to change.


So, with that in mind, the attached patch causes lynx to parse
'<tag attributes />' as '<tag attributes></tag>', this properly renders

For your a example, I would suggest parsing it as the XHTML
specification assumes it will be parsed, and then making sure
that the <a> gets closed by the error recovery when the next closing
tag, or block level opening tag, is found.

If you are seeing script abused in this way in compatibility mode, you
should point out that both the XHTML and XML specifications say you
mustn't use the shorthand notation in material intended for HTML browser compatibility and ask the author why they are using XHTML 1.0 if they are not ensuring that the code is valid. As I fix, I would suggest detecting the src parameter and either forcing an empty content model for that instance, or turning off the CDATA handling of the contents, so that the error gets recovered on the next tag.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]