[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: lynx-dev lynx: have bug (fwd)
From: |
Leonid Pauzner |
Subject: |
Re: lynx-dev lynx: have bug (fwd) |
Date: |
Sun, 21 Mar 1999 18:34:27 +0300 (MSK) |
21-Mar-99 06:56 Klaus Weide wrote:
> On Sun, 21 Mar 1999 address@hidden wrote:
>> Forwarded message:
>> > From: address@hidden
>> > Date: Sun, 21 Mar 1999 11:07:58 +0200 (EET)
>> > Message-Id: <address@hidden>
>> > To: address@hidden
>> > X-URL: http://www.slcc.edu/lynx/release2-8-1/
>> >
>> > With lynx 2.8.1 on slackware 3.6, we've seen that the postings to CGI's
>> > work
>> > wrong. It posts Turkish letters wrong, however older versions can post them
>> > correctly.
Please add more information:
try to compare -trace logs between old version and version 2.8.1
near the point where the problem supposed (sending HTTP request and before).
Because of a length of trace logs please sent us a relevant fragment only.
>> >
>> > In the bug fixes page, there were no such fix (there were fixes only for
>> > build
>> > errors, load errors,core dumps etc.)
>> >
>> > Is this bug been recognized ?
>> > Is there a patch for it ?
>> >
>> > -Turan Yuksel (address@hidden)
> Please give some more information: an example (of a page with the FORM)
> with URL; and the settings from Options (.lynxrc) and lynx.cfg that relate to
> charsets; and which characters are wrong.
One certain "problem" I personally run into is a utf-8 URL encoding:
when HREF= have *open 8-bit text* the remote server (script)
may (1) expect such bytes %xx-encoded,
but lynx now (2) translate URLs from document charset to utf-8
and then sent each byte %xx-encoded (an obvious check -
a number of %xx encoded bytes increased).
UTF-8 URL-encoding was proposed in several recent drafts
(not handy, but I remember a note that certain protocols
or servers may expect blind %xx encoding, not utf-8
so we may need a configurable option between (1) and (2) for compatibility.
Also I doubt lynx do (2) in all cases, saw it only for HTML's -
a proper solution here may be to not include open 8-bit bytes in HREF=url
but only %xx-encoded by page authors).
At least I18N (RFC2070) describe the problem:
RFC 2070 HTML Internationalization January 1997
5.2. Form submission
The HTML 2.0 form submission mechanism, based on the "application/x-
www-form-urlencoded" media type, is ill-equipped with regard to
internationalization. In fact, since URLs are restricted to ASCII
characters, the mechanism is akward even for ISO-8859-1 text.
Section 2.2 of [RFC1738] specifies that octets may be encoded using
the "%HH" notation, but text submitted from a form is composed of
characters, not octets. Lacking a specification of a character
encoding scheme, the "%HH" notation has no well-defined meaning.
> It may not be a bug, but you have to set up lynx correctly.
> Try it with -raw (or the equivalent '@' key toggle), or with
> -assume_charset=iso-8859-9 (you possibly also want
> -assume_local_charset=iso-8859-9).
> Klaus
More from FRC 2070:
The best solution is to use the "multipart/form-data" media type
described in [RFC1867] with the POST method of form submission. This
mechanism encapsulates the value part of each name-value pair in a
body-part of a multipart MIME body that is sent as the HTTP entity;
each body part can be labeled with an appropriate Content-Type,
including if necessary a charset parameter that specifies the
character encoding scheme. The changes to the DTD necessary to
support this method of form submission have been incorporated in the
DTD included in this specification.
A less satisfactory solution is to add a MIME charset parameter to
the "application/x-www-form-urlencoded" media type specifier sent
along with a POST method form submission, with the understanding that
the URL encoding of [RFC1738] is applied on top of the specified
character encoding, as a kind of implicit Content-Transfer-Encoding.
One problem with both solutions above is that current browsers do not
generally allow for bookmarks to specify the POST method; this should
be improved. Conversely, the GET method could be used with the form
data transmitted in the body instead of in the URL. Nothing in the
protocol seems to prevent it, but no implementations appear to exist
at present.
How the user agent determines the encoding of the text entered by the
user is outside the scope of this specification.
NOTE -- Designers of forms and their handling scripts should be
aware of an important caveat: when the default value of a field
(the VALUE attribute) is returned upon form submission (i.e. the
user did not modify this value), it cannot be guaranteed to be
transmitted as a sequence of octets identical to that in the
source document -- only as a possibly different but valid encoding
of the same sequence of text elements. This may be true even if
the encoding of the document containing the form and that used for
submission are the same.
Yergeau, et. al. Standards Track [Page 17]
RFC 2070 HTML Internationalization January 1997
Differences can occur when a sequence of characters can be
represented by various sequences of octets, and also when a
composite sequence (a base character plus one or more combining
diacritics) can be represented by either a different but
equivalent composite sequence or by a fully precomposed character.
For instance, the UCS-2 sequence 00EA+0323 (LATIN SMALL LETTER E
WITH CIRCUMFLEX ACCENT + COMBINING DOT BELOW) may be transformed
into 1EC7 (LATIN SMALL LETTER E WITH CIRCUMFLEX ACCENT AND DOT
BELOW), into 0065+0302+0323 (LATIN SMALL LETTER E + COMBINING
CIRCUMFLEX ACCENT + COMBINING DOT BELOW), as well as into other
equivalent composite sequences.
- lynx-dev lynx: have bug (fwd), dickey, 1999/03/21
- Re: lynx-dev lynx: have bug (fwd), Klaus Weide, 1999/03/21
- Re: lynx-dev lynx: have bug (fwd),
Leonid Pauzner <=
- Re: lynx-dev lynx: have bug (fwd), Klaus Weide, 1999/03/21
- Re: lynx-dev lynx: have bug (fwd), Leonid Pauzner, 1999/03/21
- lynx-dev URLs with raw 8-bit chars (was: lynx: have bug), Klaus Weide, 1999/03/21
- Re: lynx-dev URLs with raw 8-bit chars (was: lynx: have bug), Leonid Pauzner, 1999/03/22
- Re: lynx-dev URLs with raw 8-bit chars (was: lynx: have bug), Leonid Pauzner, 1999/03/22
- Re: lynx-dev URLs with raw 8-bit chars (was: lynx: have bug), Klaus Weide, 1999/03/22
- Re: lynx-dev URLs with raw 8-bit chars (was: lynx: have bug), Leonid Pauzner, 1999/03/22
- Re: lynx-dev URLs with raw 8-bit chars (was: lynx: have bug), Klaus Weide, 1999/03/22
- Re: lynx-dev URLs with raw 8-bit chars (was: lynx: have bug), Leonid Pauzner, 1999/03/22
- Re: lynx-dev URLs with raw 8-bit chars (was: lynx: have bug), Klaus Weide, 1999/03/22