lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Re: 283dev15 for Win32


From: Klaus Weide
Subject: Re: lynx-dev Re: 283dev15 for Win32
Date: Mon, 6 Dec 1999 16:24:50 -0600 (CST)

On Sat, 4 Dec 1999, Henry Nelson wrote:

> > That means we have large sections in the common code without any kind of
> > documentation readable by most developers.  Quite bad.
> 
> No arguement there.  All there is in English that I know is [...]
> [...] hopelessly out of date.
> 
> It's not a good situation, but if there is something specific you wish to
> ask Hiroyuki and he is not able to get his meaning across, I'll do the best
> I can to help you communicate with each other.

Thank you for the offer.
Do you know whether he is reading the list?

> It's the difficulty of determining a page's coding; which one of the below
> is it?
>   7-bit ISO 2022        <ESC> & @ <ESC> $ B 0x3441 0x3B7A <ESC> ( J
>   ISO-2022-JP           <ESC> $ B 0x3441 0x3B7A <ESC> ( J
>   EUC                   0xB4C1 0xBBFA
>   Shift-JIS             0x8ABF 0x8E9A
> Also, even if the coding is known or properly identified, what does one
> do with characters that are not supported in the display encoding?

But that is a very old problem; I kinda though it was more or less solved
a long time ago, with Takuya ASADA's changes.  It's a problem of
interpreting received data; the solution (more or less successful
guessing, If I understand right) shouldn't depend on whether Lynx is
running on Windows or something else.  Yet it seems a lot of the more
recently added code for Japanese is Windows-specific.  It seems I don't
even understand the problem, so no surprise that I don't understand the
solutions.

> Perhaps *the* starting place for English readers is:
>    Linkname: CJK.INF
>         URL: http://www.ora.com/people/authors/lunde/cjk_inf.html
> 
> > What is "7-bit kana"?  Or is that a typo?
> 
> Perhaps my way of calling it is incorrect.  Is one-byte correct?  

I don't know what the right or conventional terminology is; but yes, to
me that makes more sense.

> Anyway,
> I meant all those characters refered to as "HALFWIDTH" in the document (I
> think called "SHIFTJIS.TXT"):
> #       Name:             Shift-JIS to Unicode
> #       Unicode version:  1.1
> #       Table version:    0.9
> #       Table format:     Format A
> #       Date:             8 March 1994
> #       Authors:          Glenn Adams <address@hidden>
> #                     John H. Jenkins <address@hidden>
> #
> #       Copyright (c) 1991-1994 Unicode, Inc.  All Rights reserved.
> [...]
> 0xA1    0xFF61  # HALFWIDTH IDEOGRAPHIC FULL STOP
[...]
> 0xDF    0xFF9F  # HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK

Thank you for the explanation.

The existence of those 1-byte codes is something I totally neglected
in my recent changes (for WHEREIS search highlighting glitches, I think
you know what I mean).  That means that the code should be correct for
EUC-JP, but still not for Shift-JIS.  (Since WHEREIS operates on the
end result of Lynx's formatting and conversions, I suppose it should be
correct for Display Character Set == "Japanese (EUC-JP)" and incorrect
for D.C.S. == "Japanese (Shift_JIS)", independent of the original charset
of the document as transmitted, as long as Lynx's conversion was otherwise
correct.)

Those functions in LYStrings.c that take 'HTCJK' into account just 
assume that
         if (HTCJK != NOCJK)
         then every non-ASCII char is the start of a 2-byte character.
Somebody should have told me earlier that that's wrong...

    Klaus


reply via email to

[Prev in Thread] Current Thread [Next in Thread]