lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev chartrans to CJK-like display (was: stopping when viewing a


From: Henry Nelson
Subject: Re: lynx-dev chartrans to CJK-like display (was: stopping when viewing a site)
Date: Thu, 26 Aug 1999 11:13:11 +0900 (JST)

henry:
> >> I have relied on this behavior in the past to create my own default
> >> character set, by simply copying it over src/chrtrans/def7_uni.tbl.
> >> (An example entry is "0x5c U+00a5" to replace "U+00a5:YEN" because
> >> this gives me a true yen sign on a Japanese Windows machine).  It is
> >> a kind of trick to get a display character set that matches the
> >> machine I am running the terminal emulator on.  It is some hybrid

Leonid:
> FTP Directory: ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/
[...]
>  [FILE]  CP1251.TXT . . . . . . . . . . .  [Sep 17  1998]     10k
>  [FILE]  CP1252.TXT . . . . . . . . . . .  [Sep 17  1998]     10k
>  [FILE]  CP1253.TXT . . . . . . . . . . .  [Sep 17  1998]      9k
[...]
>  [FILE]  CP949.TXT. . . . . . . . . . . .  [Sep 17  1998]    790k
>  [FILE]  CP950.TXT. . . . . . . . . . . .  [Sep 17  1998]    511k
> 
> I think you noticed that most (256-chars) pages all around 10Kb
> while few others hold hundreds kilobytes of data (does that CJK pages?).
> The size of the table may be a real problem when thinking on translation
> of that type.

Klaus:
> > I don't know how lynx would deal with a character set that is CJK *and*
> > has translation tables - so far this has just not come up.  Changes
> > will be needed in various places, at least so that lynx doesn't skip
> > table translation immediately when it sees CJK.  Still I think this
> > would be the more logical way to add this kind of limited unicode-to-
> > CJK-display translation.

Klaus probably understands better than I do what I want, but I definitely
do NOT want CJK translation tables to become a part of Lynx.  Klaus used
the term "CJK-with-.tbl-file approach".  This means, as best as I can
explain it, leave all CJK up to the Sato/Asada CJK handlers, and use a .tbl
file for the decimal 160-255 characters that are NOT CJK (basically anything
in the us-ascii extended _range_).  The Sato/Asada CJK handlers do their
job VERY well.  First they decide what character set is being thrown at
them, e.g., in Japanese there is euc-jp, sjis-jp and iso-2022-jp.  This
is the hardest part of the job.  The character sets can be jumbled
together, and in most cases the Sato routines will ferret out what is what.
THEN they do a translation, for the multibyte characters, to the unique
character set the user chooses.  The font tables themselves must be on the
user's machine, otherwise only a weird combination of ascii character (and
control codes if not filtered) will end up on the screen.

At the very start of this thread you recommended that I use the 8 bit
display charset, I assume UNICODE (UTF-8).  Does that mean you thought
I would see the "correct" characters?  What happens on my system is that
I see kanji, because they are multibytes.  For example, in this mail
probably most of you see the following as (if memory serves) some kind
of " ' " (a single quote): " ?? ".  I see the kanji for "stupid".  (So
I don't change this one because it reminds me how ?? $MS can be.  If
I recall, this is an "illegal", i.e., out of range, entity.)

So, again by example, what I mean by having a table for CJK, is to map
an entity like ¢ to a multibyte cent symbol defined in Japanese,
" ?? ", or £ to a multibyte pound symbol " ?? ".  (What these look
like to you, I have no idea.  To me, the first is like a "|" imposed on
a "c", and the second is a "-" on an "L".)  Many of the entities do not
have multibyte representations in Japanese, so the 7 bit approximations
are the best that can be done.  This mess happened before unicode was in
existence, and then perpetrated by commercial enterprises.

Hope I haven't totally confused you.  As simply as I can put it:
        allow table translation for (all)* single-byte characters (16x16)
        send all multibyte characters to be handled by CJK routines

        *(all that can be easily determined or reasonably assumed to be)

__Henry

reply via email to

[Prev in Thread] Current Thread [Next in Thread]