lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

LYNX-DEV ANNOUNCE: lynx2.6 chartrans


From: Klaus Weide
Subject: LYNX-DEV ANNOUNCE: lynx2.6 chartrans
Date: Thu, 28 Nov 1996 20:45:51 -0600 (CST)

  For (hopefully) increasing the usability of Lynx in environments with
multiple charsets ond code pages, I have extended the character 
translation mechanism of Lynx.  Below is part of a short README.
Please test, and let me know whether it is useful.  Currently this
is not a full system with all necessary translation tables, but it
should be now more easily possible to add new charsets etc.

  The code is available from

        URL: http://www.tezcat.com/~kweide/lynx-chartrans/

To compile, you need to get lynx-patch-2.6ct-0.1.pch.gz, *and* either
of lynx-newfiles-2.6ct-0.1.zip or lynx-newfiles-2.6ct-0.1.tar.gz.
See the README.chartrans, which is included in the lynx-newfiles-* files
and also readable at the above address.

Note that this does not deal with CJK character sets (but rather only
good old 8-bit charsets and Unicode/UCS2), I tried to leave the previous
processing for CJK charsets intact (but have no way to test whether
I succeeded with that).

The patches were made relative to Lynx2.6 + Composite Patches from
Hiram (last CHANGES date 11-24-96).  I normally use Linux+Slang, so
that may be where it works best; but I verified that the stuff also
compiles for a sun4 target.

Output of raw UTF8 (needs of course a termina which understands it)
seems to work better, but not perfect, with Slang.  This is a problem
beyond Lynx, a curses replacement which understands multibyte characters
properly would be needed to avoid putting characters in the wrong screen
position. (Does anyone know of such a beast?)

The code is currently a bit ugly, and I am sure there are may
glitches.  You can help me find them.


Excerpt from README.chartrans:

Lynx CHARTRANS

New features:
 - Can (attempt to) translate from any document charset to any display
   character set, *IF* the document charset is known by a translation 
   table (compiled in at installation).

 - Old method for specifying translations of Latin1 characters and
   SGML entities still supported. (IBMPC-charsets.announce is still
   relevant.)

 - New method to define character sets: used for input charset as well
   as display character set, translation tables compiled in from 
   separate files (one per charset).

 - Unicode (UTF8) support: can (attempt to) decode and translate UTF8 to
   display character set, or pass through UTF to display (if terminal
   or console understands UTF8).  [only tested with Slang so far, does
   not always position everything correctly on screen]

 - Support for CHARSET attribute on A tag [but not yet on LINK], as in
   HTML i18n draft.  A link can suggest the target's charset in this way.

 - EXPERIMENTAL, currently enabled only for Linux console: 
   can (attempt to) automatically switch terminal mode and load new
   code pages on change of display character set.

 - some minor changes: sometimes invalid characters are displayed in a hex
   notation Uxxxx (helps debugging, but I also regard it as at least not
   worse than showing the wrong char without warning).  KOI8 -> other cs
   will just strip high bit from cyrillic chars (gives somewhat readable
   ASCII, KOI was constructed that way...)

Additions/changes to user interface:

 - many new Display Character Sets are available on O)ptions screen.
   (also can now use arrow keys, HOME, END for cycling through the list). 

 - new command line flags:
   -assume_charset=...  assume this as charset for documents that don't
                        specify a charset parameter in HTTP headers
   -assume_unknown_charset=...  in case a charset parameter is not recognized 
   -assume_local_charset=... assume this as charset of local file: docs 

 - The "Raw" toggle (from -raw flag, '@' key, or Options screen)
   o  should work as before for CJK charsets,
   o  otherwise toggles the assumption "Default remote charset is same 
      as Display Character Set" on or off.
   (Try the "Transparent" Display Character Set for more "rawness".)

[Some notes about compiling etc. snipped, see the URL.]
--

      Klaus

;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]