lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV charset issues (was: hodge-podge, updates)


From: Klaus Weide
Subject: Re: LYNX-DEV charset issues (was: hodge-podge, updates)
Date: Tue, 12 Nov 1996 02:15:33 -0600 (CST)

On Tue, 12 Nov 1996, Drazen Kacar wrote:

> Klaus Weide wrote:
> 
> > We don't have to accept that for a fact just yet.
> > (Except that Lynx will always be limited by what characters terminals
> > and emulators provide.)
> 
> Yes, but users don't usually have a need for 100 (8-bit) code pages. Only
> two ISO 8859-x pages can represent very wide range of languages. 

True.  As long as all those languages are based on the latin alphabet...
Heh, we could probably fit all the displayable characters from 
Latin 1+2 in one combined code page, and distribute that with Lynx!
But that wouldn't help in the general case where we cannot assume that
the user can do anything about his/her code page.  (At least I think
that is the general case...)  It would work for Linux, as long as we
assume user is using console in 80x25 mode..

> Unix
> lacks a few things here. For example, if the terminal can switch code
> pages, there is no termcap/terminfo capability to indicate this. There
> are no standardized LC_CTYPE names, no mapping between IANA registered
> charset names and LC_CTYPE files, no curses (or ncurses) functions for
> approximation of one code page with another...

I assume the most common situation will remain, for a while, where Lynx
cannot make any assumption about extended capabilities of the terminal/
emulator, but has to be able to give a reasonable representation under
limited conditions.  

> > In fact, if WE really want it, we could probably have a Lynx patch
> > version with limited but usable, and clean (as far as the major Lynx
> > code) Unicode support, in a week.  Really.  
> > We would have to define what we mean with limited support, maybe just
> > being able to display pages with charset=UTF-8 (there are some) by
> > translating to Latin-1.
> 
> Not good enough. Lynx can currently approximate Latin 1 characters with any
> local terminal definition, but the reverse is impossible. Unicode support
> should be able to transpose to any of the local code pages.

(I suppose you mean "to and from", between any two encodings it knows.)

Yes, it should.  But how?  The question (I have) is actually not how
to map characters from one encoding to another, - I have code for that
in my prototype - but rather where and when.  There has to be a model
for at which stage in the processing Lynx is expecting what charset,
and how to specify it.  (At least that's what I am thinking).
In a way, it would be easiest to follow the model from the HTML i18n
draft and have just *one* charset during the SGML/HTML processing,
and convert everything to/from it before and after that.  UTF-8 (RFC 2044)
could be used internally.  That would be a drastic change that I cannot
and do not wish to make alone..  How would this interfere with the CJK
charset processing (which probably has to be left as it is)?  
Should any new methods for translating character supersede the current
tables in LYCharSets.c, or somehow incorporate them?

> > Is it reasonable oberhead to call a translation function for each
> > (maybe just non-ASCII) character?
> 
> Can it be a table look-up? Function call will be incredibly slow.

That was my first (maybe just instinctive:)) reaction.  But the code
already (for HTML text) is going through several character-by-character
function calls, so there is not that much _additional_ overhead..
A straightforward table lookup is not possible, since I don't think
we want to keep several 64kbyte tables around in memory.

> > make it available.  Tell me YOU want to look at it to make some
> > concrete suggestions, and go over parts of it with me, and I'll put
> > it on a Web page _immediately_ (well after doing whatever is necessary
> > to avoid disk quota problems..).
> 
> If you have problems with qouta and need web space, I can provide some.
> Want it? :)

I may come back to it:)

  Klaus

;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;



reply via email to

[Prev in Thread] Current Thread [Next in Thread]