lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

LYNX-DEV Re: lynx2.6 chartrans


From: Klaus Weide
Subject: LYNX-DEV Re: lynx2.6 chartrans
Date: Fri, 29 Nov 1996 17:02:22 -0600 (CST)

On Fri, 29 Nov 1996, Drazen Kacar wrote [to me]:

> Klaus Weide wrote:
> > On Fri, 29 Nov 1996, Drazen Kacar wrote:
> > 
> > > Klaus Weide wrote:
> > > > 
> > > >   The code is available from
> > > > 
> > > >         URL: http://www.tezcat.com/~kweide/lynx-chartrans/

> [...] On IANA site, there is URL for tables with Microsoft code pages,
> but FTP server returns file not found. I suppose I'll manage, somehow :)

When I ran into that, I noticed that the directiory structure on the
site pointed to had changed.  Just browse through the FTP directories on
unicode.org, and you will probably find what you want.
 
> >For example you
> > mentioned that there are standards with a well-defined one-to-one
> > mapping between iso-8859-2 and cyrillic characters (for some languages).
> 
> There is for Serbian and Macedonian. Could be there is for some others
> too, but I don't know. There might be some problems with ISO 8859-5.
> Recently there was a post on comp.unix.solaris from some Russian guy..
> He said that Sun maybe thinks ISO 8859-5 is enough for Russian, but they
> have different opinion on that. KOI-8 is de facto standard there.
> He posted URL for explanation, but I didn't save the article.
> It should be available via dejanews, article has iso-8859-2 in subject
> (or iso8859-2).

Without knowing much of the background - it seems obvious that KOI-8
may now be "standard" for Russian but is insufficient for many other
languages that use cyrillic script.  ISO-8859-5 has at least some
characters used in languages other than Russian, while KOI-8 uses a lot
of space for all those DOS linedrawing characters.
 
> > Using such a mapping for displaying cyrillic text in (e.g.) iso-8859-2 
> > would just require the right table.  (The other direction would be
> > slightly more difficult, since currently for ASCII chars the translation
> > is skipped [cheating for speed].)
> 
> Hehe... Did I tell you that some bright guys use ISO 646 code pages?
> Those are 7-bit standards, national characters are put instead of
> []{}|address@hidden characters. Hehehe... Don't be afraid, you don't have to 
> support
> that... :)

I think that's also still popular in some Scandinavian countries.
 
> > Don't be mislead if, when you first try my code, your name comes up
> > correctly, Dražen Kačar :).  I added those two latin-2
> > entities just for you :) (actually, for testing), but not a full set of
> > latin-2 entities or latin-1 replacements for latin-2 chars.
> 
> There's one wrong thing with Lynx entities table (before your changes).
> Ð and &Dstroke; are synonyms. ETH is Latin 1 (Icelandic, I think) and
> Dstroke is Latin 2 (4 or 5 languages). Upper case glyphs look the same,
> but lower case glyphs don't. And ASCII approximation is different, "dh"
> for ð and "dj" for &dstroke;. The full ISO entities tables come
> with XEmacs, IIRC.

So how about sending a patch?

> > As long as longs are at least 32-bit, and shorts are 16-bit, the chartrans
> > code _should_ not create any additional problems...
> 
> Did you mean shorts are at least 16 bits, or shorts are 16 bits? I have
> a bad feeling about this. Do you have access to OSF?

No access to OSF.  I _think_ shorts longer than 16 bits wouldn't be a
problem (except for wasting some memory), but I guess we will find out.

[ some details about Solaris 2.6 pointer problems snipped ]
 
> There are probably some more little devils, but nothing else comes to my
> mind right now.

I don't think any of those should be a problem for my patches, or even
for the Lynx code as a whole.  But you'll find out..
 
> > > > Output of raw UTF8 (needs of course a termina which understands it)
> > > > seems to work better, but not perfect, with Slang.  This is a problem
> > > > beyond Lynx, a curses replacement which understands multibyte characters
> > > > properly would be needed to avoid putting characters in the wrong screen
> > > > position. (Does anyone know of such a beast?)
> > > 
> > > I think native curses on Solaris can. I'll check.
> > 
> > I would also be interested to hear what other terminal environments can
> > understand UTF8.  So far I only know about the Linux console.
> 
> Perhaps Digital UNIX. I'll check... if you can tell me what to check.
> Which man page, or whatever.

Guessing where which vendor will hide such information goes beyond my
abilities.  But `man curses' should be a good starting point...

   Klaus

;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]