lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev lynx and other character sets


From: Klaus Weide
Subject: Re: lynx-dev lynx and other character sets
Date: Sat, 26 Jun 1999 20:20:25 -0500 (CDT)

On Sat, 26 Jun 1999 address@hidden wrote:

> > Tom, since xterm with the latest patches now supports UTF-8, have you 
> > tested it at all with lynx's 'UNICODE (UTF-8)' display character set? 
> 
> no - I've not gone in that direction (am not sure what I could use for
> testing, and had thought Lynx maps into an 8-bit display character set).

Use any page that has non-CJK, non-7bit characters, like ones in the test/
dir, and set display character set to 'UNICODE (UTF-8)', that's all.

Mapping from 8-bit input charset to UTF-8 sequences should then occur.
If input charset from document is also UTF-8, it should go to the screen
somewhat transparently.

Best results expected with slang and with -DSLANG_MBCS_HACK.
Ncurses' optimizations are too clever.

Btw., here is another nice UTF-8 test page:
         <http://www.cogsci.ed.ac.uk/~richard/unicode-sample.html>

Since it has long lines with many UTF-8 characters, it should demonstrate
the effect of SLANG_MBCS_HACK.

       ----

When display character set is NOT 'UNICODE (UTF-8)' (and not CJK or
transparent either), I notice something strange for all the scripts
Lynx doesn't understand (Armenian, Devanagari, Bengali, ...):
Those characters are not shown in any way, there is no indication
that something was missing.   Some earlier version would show
something like

      Armenian
             U531 U532 U533 U534 U535 U536 U537 U538 U539 ...

instead.  Leonid, was this a concious decision?  Seems like a bug
to me.

      ----

Another observation: in the situation of the provious section,
force Raw Mode on.  This has to be done from the 'O'ptions screen,
since '@' is now disable for explicit charset.  The missing characters
(or some of them) are now shown in some kind of 'raw' way.  This is
also the case in an earlier lynx version I keep around for reference
("2.7.1ac-0.91"), but in a different way.  I think I found this
somewhat useful a long time ago for certain kinds of broken "utf-8"
documents, that's why it was there, and apparently it has survived.

Leonid, I mention this since (as I seem to remember) you asked some
months ago if there was a case where 'Raw Mode' makes a difference
for explicitly charsetted documents. This is one.  (Maybe the only
one, or the only surviving one.)

If you want to pursue this further, I can try to dig up the page(s)
where I found this useful.


    Klaus


reply via email to

[Prev in Thread] Current Thread [Next in Thread]