lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev HTML4.0 and default charset


From: Klaus Weide
Subject: Re: lynx-dev HTML4.0 and default charset
Date: Sun, 7 Mar 1999 07:01:55 -0600 (CST)

On Fri, 5 Mar 1999, David Woolley wrote:
[kw:]
> > Btw., your formulation "character set in which to render" shows signs of
> > infection by popular-browser-think. :)  One doesn't render "in" a charset,
> > I'd prefer to say one renders (possibly "in" a _font_) "assuming" a charset.
> 
> Would you be happier with "from which to render"; I certainly had no intention
> of implying anything about the local character set.

Yes.

I know you mean the right thing.  But your formulation reminded me of
terminology I always found highly confusing in connection with Netscape's
browser.  ("Choose a document character set" (or similar).  Huh?  How can
*I* choose a d.c.s., it's the author or the server who does that, it's too
late to "set" a d.c.s. when the bytes arrive at my end; all I can change
is an *assumption*.)

> Any mention of fonts
> is dangerous because a common abuse is things like <font
> face=wingdings>J</font> (a smiley face).  This is one I have seen recently,
> but the symbol font is more common.

Agreed.

> > > If the status line says HTTP/1.1, and there is no charset, a HTTP 1.1 
> > > browser
> > > cannot legitimately assume that it is dealing with, say, Ukrainian,
> > 
> > You seem to assume that the response version string is end-to-end, but
> > it is supposed to be hop-by-hop information only.
> 
> >From what I remember, CERN passes through the protocol unless it has
> a cache hit.  CERN is HTTP 1.0.  We're now on squid so I can't double
> check, but I think CERN may even cache the protocol.

Yes, CERN httpd does that, and so does squid (at least previous versions,
not sure about "caching" the protocol version or about recent versions).
But it was regarded a a bug by the http-wg.  Apache's proxy component used
to also do this, but has been changed (note that it's still HTTP 1.0).
[All this is from memory, without checking now.]

Note that all of these only pass through the protocol version in *responses*,
but not in *requests*.

> It would seem there is a new ambiguity here becasue the proxy is more
> or less acting as client and server in this case, 

Not just "more or less" - acting as client *and* server is in the definition
of a HTTP proxy.

> and should arguably
> be doing character set identification (as it seems do some Russian ones).

One could argue that it should, but without authoritative information (which
could in the general case only come from the origin server's HTTP headers)
it doesn't *know* the charset any better than the end client would, and
making up a charset parameter out of thin air would not be a reasonable
requirement.

> Generally, though, this, like a lot of things in web standards, is damage
> limitation, and I think it unlikely that correct character sets will be
> common for many years.

Agreed, twice.

But whether common or not - it should be the first priority that lynx
works crrectly with sites that *do* correctly identify the charset.
And I think it does that, to a high degree.  After that, it's nice to
make it work in other situations too, but that should never prevent
correct functioning in the "correct" case.

After all, the "incorrect" sites may eventually get fixed to the
"correct" behaviour - more likely for those sites where there is a
problem sufficiently big that it warrants investigation.

   Klaus

reply via email to

[Prev in Thread] Current Thread [Next in Thread]