lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Conversion of special character codes within anchor tags


From: Greg Marr
Subject: Re: lynx-dev Conversion of special character codes within anchor tags
Date: Thu, 24 Sep 1998 14:09:01 -0400

>In the process I have noticed that Lynx 2.8 performs special character code
>translation within A HREF= tags.

As it should.

>For example: I could have a tag which
>pointed to
<http://www.some.site/sample.cgi?para=1&curren=GBP>http://www.some.site/samp
le.cgi?para=1&curren=GBP (sending the
>two key/value pairs of para=1 and curren=GBP). Most browsers (including lynx
>2.6) would send this as it appears above when the link was selected, but
           ^incorrectly
>lynx 2.8 substitutes the universal currency symbol for &curren.

As it should.

>The point is this: special character entities exist so that characters which
>do not fit within the 7-bit ASCII character set can be transmitted across
>networks that may strip the eighth bit. 

That is not the only reason they exist.  Take the cases of &amp; &lt; &gt;
&quot;.  These entities exist because the bare characters have significance in
HTML: the start of an entity reference, the start of a tag, the end of a tag,
and an attribute value delimiter.  (The latter is mainly only used inside
quote-delimited attributes in order to not end the attribute.)

>There is absolutely no point,
>therefore, in enabling their translation within a URL, because the
>characters into which they are translated should not be sent across the
>internet in that form - they exist because of that very point!

When characters that aren't 7-bit clean appear in URLs, they have to be
URL-escaped.  HTML character entities are processed by the HTML parser before
the browser ever sends the URL anywhere across the net.  URL escaping and HTML
character entities are totally different animals.

>You could say: simply use a different delimiter instead of ampersand to
>separate field names.

The semicolon is also used as a delimiter, just for this reason.  I don't
remember the RFC, but this has been discussed here before, and someone found
the RFC that mentions this.  You could search the archives to find it.

>One obvious solution is to use field names which do not correspond to
>special character entities. 

The proper solution is to escape the & as &amp;.  A conforming HTML processor
will change this to & before it uses the URL.

>It seems a ridiculous burden to place on web site developers that every time
>HTML is revised, we have to check every field name and every URL for 
>incompatibilities.

Which is why you should escape the & when putting the URL inside an HTML page.

>It has been put to me in a previous discussion that, because HTML standards
>do not specify that URLs in anchor tags should be treated differently to
>other text on the page, this means that they should be treated the same.

Precisely.

>I would say that it simply means that the situation needs clarification, 

The situation is clear enough already, and doesn't need clarification.  Any
literal ampersands in an HTML file need to be escaped as &amp; to eliminate
the
possibility that they could be starting a character reference.

>If all browsers agreed, it would be possible, however pointless, to take
>account of this behaviour by sending the URL as
><http://www.some.site/sample.cgi?para=1&curren=GBP>http://www.some.site/sa
mple.cgi?para=1&amp;curren=GBP.

This is correct.  When a URL with an ampersand appears in HTML, the ampersand
has to be escaped as &amp;.  The Perl CGI.pm module does this automatically.

>However, very few
>browsers agree with lynx's behaviour on this point, so sending the URL in
>this form would create more problems than it solved.

Which browsers do not handle this properly?  They should be fixed.

>What I don't understand is why version 2.8 would take the retrograde step of
>introducing an incompatibility that did not exist in earlier versions of the
>browser. 

It is not a retrograde step that introduces an incompatibility, it is adhering
to the proper behavior as described by the standards.

--
Greg Marr
address@hidden
"We thought you were dead." 
"I was, but I'm better now." - Sheridan, "The Summoning"

reply via email to

[Prev in Thread] Current Thread [Next in Thread]