lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lynx-dev Conversion of special character codes within anchor tags


From: Bruno Prior
Subject: lynx-dev Conversion of special character codes within anchor tags
Date: Thu, 24 Sep 1998 18:37:10 +0100

I'm new to this list, so forgive me if this topic has been covered before,
but I couldn't find anything in the archives.

I'm developing a web site and checking, as I presume subscribers to this
list would approve, that it works with as many browsers as possible,
including Lynx 2.8 (running under Linux 2.0.35) and Lynx 2.6 (Linux 2.0.27).

In the process I have noticed that Lynx 2.8 performs special character code
translation within A HREF= tags. For example: I could have a tag which
pointed to http://www.some.site/sample.cgi?para=1&curren=GBP (sending the
two key/value pairs of para=1 and curren=GBP). Most browsers (including lynx
2.6) would send this as it appears above when the link was selected, but
lynx 2.8 substitutes the universal currency symbol for &curren.

The point is this: special character entities exist so that characters which
do not fit within the 7-bit ASCII character set can be transmitted across
networks that may strip the eighth bit. There is absolutely no point,
therefore, in enabling their translation within a URL, because the
characters into which they are translated should not be sent across the
internet in that form - they exist because of that very point!

You could say: simply use a different delimiter instead of ampersand to
separate field names. However, ampersand is the delimiter used when such a
URL is constructed by a browser submitting a GET type form. It is therefore
the most obvious delimiter, and the simplest to use if (as is my case) the
link is constructed by a CGI-script attaching the QUERY_STRING environment
variable from a form submission. One could even envisage a situation where a
script could be called by either submitting a form or selecting a link
(functionally identical actions), in which case the script would have to
cope with different delimiters without knowing which method was used to call
it. And other possible delimiters (such as + or ;) also present problems of
their own.

One obvious solution is to use field names which do not correspond to
special character entities. However, this is a movable feast. Each revision
of the HTML standard has added more character entities, so that formerly
safe field names suddenly become unsafe. Exactly this occurred when I was
checking my site against lynx. My field names were chosen some time ago,
when ∑ and ⟨ were not recognized entities. I have not been
following the additions of recent HTML flavours, but presumably they have
been added somewhere along the line, because lynx translates them into their
respective characters. Even specifying an earlier HTML version in the
DOCTYPE definition at the top of the page, does not make any difference. It
seems a ridiculous burden to place on web site developers that every time
HTML is revised, we have to check every field name and every URL for
incompatibilities.

It has been put to me in a previous discussion that, because HTML standards
do not specify that URLs in anchor tags should be treated differently to
other text on the page, this means that they should be treated the same. I
would say that it simply means that the situation needs clarification, and
that, given the illogicality of including non-ASCII characters in URLs, the
clarification should accept the status-quo of not translating the special
character entities.

If all browsers agreed, it would be possible, however pointless, to take
account of this behaviour by sending the URL as
http://www.some.site/sample.cgi?para=1&curren=GBP. However, very few
browsers agree with lynx's behaviour on this point, so sending the URL in
this form would create more problems than it solved.

I have had this debate once before with regard to MSIE3.02, and I would hope
that users of a proper browser would not want to follow in the shoes of that
crock of *#?!. And even mighty Microsoft admitted that this behaviour was
wrong by changing to conventional behaviour with IE4. There are enough sites
out there which take no account whatsoever of text-mode browsers like lynx.
It seems a bad idea to introduce incompatibilities which make it harder for
even those sites which would like to accommodate lynx users.

What I don't understand is why version 2.8 would take the retrograde step of
introducing an incompatibility that did not exist in earlier versions of the
browser. This is not the only example. It appears from another script of
mine that lynx 2.8 does not send the HTTP_REFERER environment variable by
default, whereas lynx 2.6 does. I would hope that there is some way to
enable sending of this variable from version 2.8 browsers, but I can't find
it. Does anyone know how?

Cheers,


Bruno Prior         address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]