[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
LYNX-DEV ANNOUNCE: lynx2.6 chartrans
From: |
Klaus Weide |
Subject: |
LYNX-DEV ANNOUNCE: lynx2.6 chartrans |
Date: |
Thu, 28 Nov 1996 20:45:51 -0600 (CST) |
For (hopefully) increasing the usability of Lynx in environments with
multiple charsets ond code pages, I have extended the character
translation mechanism of Lynx. Below is part of a short README.
Please test, and let me know whether it is useful. Currently this
is not a full system with all necessary translation tables, but it
should be now more easily possible to add new charsets etc.
The code is available from
URL: http://www.tezcat.com/~kweide/lynx-chartrans/
To compile, you need to get lynx-patch-2.6ct-0.1.pch.gz, *and* either
of lynx-newfiles-2.6ct-0.1.zip or lynx-newfiles-2.6ct-0.1.tar.gz.
See the README.chartrans, which is included in the lynx-newfiles-* files
and also readable at the above address.
Note that this does not deal with CJK character sets (but rather only
good old 8-bit charsets and Unicode/UCS2), I tried to leave the previous
processing for CJK charsets intact (but have no way to test whether
I succeeded with that).
The patches were made relative to Lynx2.6 + Composite Patches from
Hiram (last CHANGES date 11-24-96). I normally use Linux+Slang, so
that may be where it works best; but I verified that the stuff also
compiles for a sun4 target.
Output of raw UTF8 (needs of course a termina which understands it)
seems to work better, but not perfect, with Slang. This is a problem
beyond Lynx, a curses replacement which understands multibyte characters
properly would be needed to avoid putting characters in the wrong screen
position. (Does anyone know of such a beast?)
The code is currently a bit ugly, and I am sure there are may
glitches. You can help me find them.
Excerpt from README.chartrans:
Lynx CHARTRANS
New features:
- Can (attempt to) translate from any document charset to any display
character set, *IF* the document charset is known by a translation
table (compiled in at installation).
- Old method for specifying translations of Latin1 characters and
SGML entities still supported. (IBMPC-charsets.announce is still
relevant.)
- New method to define character sets: used for input charset as well
as display character set, translation tables compiled in from
separate files (one per charset).
- Unicode (UTF8) support: can (attempt to) decode and translate UTF8 to
display character set, or pass through UTF to display (if terminal
or console understands UTF8). [only tested with Slang so far, does
not always position everything correctly on screen]
- Support for CHARSET attribute on A tag [but not yet on LINK], as in
HTML i18n draft. A link can suggest the target's charset in this way.
- EXPERIMENTAL, currently enabled only for Linux console:
can (attempt to) automatically switch terminal mode and load new
code pages on change of display character set.
- some minor changes: sometimes invalid characters are displayed in a hex
notation Uxxxx (helps debugging, but I also regard it as at least not
worse than showing the wrong char without warning). KOI8 -> other cs
will just strip high bit from cyrillic chars (gives somewhat readable
ASCII, KOI was constructed that way...)
Additions/changes to user interface:
- many new Display Character Sets are available on O)ptions screen.
(also can now use arrow keys, HOME, END for cycling through the list).
- new command line flags:
-assume_charset=... assume this as charset for documents that don't
specify a charset parameter in HTTP headers
-assume_unknown_charset=... in case a charset parameter is not recognized
-assume_local_charset=... assume this as charset of local file: docs
- The "Raw" toggle (from -raw flag, '@' key, or Options screen)
o should work as before for CJK charsets,
o otherwise toggles the assumption "Default remote charset is same
as Display Character Set" on or off.
(Try the "Transparent" Display Character Set for more "rawness".)
[Some notes about compiling etc. snipped, see the URL.]
--
Klaus
;
; To UNSUBSCRIBE: Send a mail message to address@hidden
; with "unsubscribe lynx-dev" (without the
; quotation marks) on a line by itself.
;
- LYNX-DEV ANNOUNCE: lynx2.6 chartrans,
Klaus Weide <=