lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

LYNX-DEV Character set support


From: Michael Sokolov
Subject: LYNX-DEV Character set support
Date: Mon, 12 May 1997 15:33:07 -0400 (EDT)

   Dear Colleagues,
   
   I have noticed a problem with international character set support in
Lynx v2.7.1. I'm Russian, and I have tried to Lynx Russian WWW sites. I use
DOS and run Lynx by acting as a terminal to my ISP's BSDI box. As a result,
I prefer the IBM PC character set. Most Russian WWW pages come in several
versions for different Cyrillic encodings, with choices on the welcome
page. It is often the case that pages in IBM PC character set don't have
any marks indicating them as such, so browsers often think they are in ISO-
8859-1. Since I know that these pages are already in the right character
set for my terminal, I thought I could just put Lynx in raw mode. However,
this didn't work as expected. That's where I have caught a problem with
Lynx.
   The IBM PC character set is somewhat special in that the codes from 80
to A0 (unless otherwise noted, all two-digit numbers are hex bytes), i.e.,
the so-called high control codes, are used for printable characters. And it
so happens that in Russian versions of this character set, all uppercase
Russian letters are in this range. The problem with Lynx is that it loses
these characters in rendered HTML output.
   When I took a cursory look at the source code, I found some places where
a fix would be helpful. The main one is in src/LYCharUtils.c. In the
HTMLSetCharacterHandling() function, there is a long if statement with many
"else if" clauses for setting special options for some character set. There
is a clause for "KOI8-R" that sets HTPassHighCtrlRaw to TRUE. KOI8-R is
another character set that uses "high control" codes for printable
characters. However, there is no clause for the IBM PC character set. I
thought that adding one by simply copying the one for KOI8-R might solve
the problem.
   I have noted earlier that pages in IBM PC character set often don't have
any marks indicating them as such. However, some of them do. For the
benefit of those pages, another fix might be useful. In three source files,
namely, WWW/Library/Implementation/HTFile.c,
WWW/Library/Implementation/HTMIME.c, and src/LYCharUtils.c, there is code
that checks if the source and terminal character sets are the same and
enables raw mode if yes. The code is a long if statement with many "else
if" clauses for different character sets. And again, there is a clause for
KOI8-R but no clause for the IBM PC character set. Making the latter by
copying the former and changing the name might be a useful fix.
   Please see if my suggestions are correct, and if they are, please add
them to the current developmental version.
   
   Sincerely,
   Michael Sokolov
   Phone: 216-646-1864
   ARPA Internet SMTP mail: address@hidden
;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]