lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lynx-dev URLs with raw 8-bit chars (was: lynx: have bug)


From: Klaus Weide
Subject: lynx-dev URLs with raw 8-bit chars (was: lynx: have bug)
Date: Sun, 21 Mar 1999 20:37:55 -0600 (CST)

On Sun, 21 Mar 1999, Leonid Pauzner wrote:
> 21-Mar-99 12:38 Klaus Weide wrote:
> > On Sun, 21 Mar 1999, Leonid Pauzner wrote:
> >>
> >> UTF-8 URL-encoding was proposed in several recent drafts
> >> (not handy, but I remember a note that certain protocols
> >> or servers may expect blind %xx encoding, not utf-8
> >> so we may need a configurable option between (1) and (2) for compatibility.
> >> Also I doubt lynx do (2) in all cases, saw it only for HTML's -
> 
> I mean the translation to utf-8 exist and document charset is not iso-8859-1.
> > It may not do it if in raw or transparent mode, or if Display character set 
> > ==
> > document charset (or assumed charset?), or if CJK, or some other combination
> > of factors.  It shouldn't have anything to do with HTML or not though.

Just did some testing on this, with (http-served)

   <TITLE>Testing 8-bit chars in HREF</TITLE>
   <META HTTP-EQUIV="content-type" CONTENT="text/html;charset=koi8-r">

   The link: <A HREF="/cgi-bin/showenv/XXXXXXX">to "XXXXXXX"</A>.

(Replace XXXXXXX with some 8-bit chars; vary charset above or leave
out, and vary lynx's settings.  showenv shows XXXXXXX URL-decoded
in PATH_INFO and PATH_TRANSLATED, can be used to check whether hex-
encoding is right.)

The behavior seems to be consistently this, for normal 8-bit charsets
('translation to utf-8 exist'  applies, didn't test UTF-8, CJK,
Transparent):

  If Display character set == the document's effective charset,
  then raw 8-bit bytes get hex-encoded directly as byte values.
  If Display character set != the document's effective charset,
  then UTF-8 representation gets hex-encoded.  'effective charset'
  as derived from explicit label and -assume_charset etc. as usual,
  .i.e. what '=' shows.

That's also what I had intended to happen in LYUCFullyTranslateString,
IIRC...

This means that the user can usually toggle between the two interpretations
with -raw / '@'.   It's not completely logical that the interpretation
of URLs should depend on this.  OTOH there's the ease of switching, and
it's more likely that encoding the raw value is the right thing (or even
possible) when the user's environment is consistent with the server's.

    Klaus

reply via email to

[Prev in Thread] Current Thread [Next in Thread]