lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV Lynx ported to Win32 and charsets...


From: Klaus Weide
Subject: Re: LYNX-DEV Lynx ported to Win32 and charsets...
Date: Fri, 1 Aug 1997 22:16:50 -0500 (CDT)

On Fri, 1 Aug 1997, Even Holen wrote:

> On Thu, Jul 31, 1997 at 06:04:50PM +0000, Christopher R. Maden wrote:
> > [Even Holen]
> > > I've just downloaded the port of Lynx for Win32 and started out to
> > > test it and very soon I've discovered that this version didn't
> > > respect my keyboard settings.
> > > 
> > > I'm living in Norway and uses a default locale of Norwegian(bokmal)
> > > with Norwegian layout of keyboard. But when using Lynx it seems to
> > > think that I'm using some sort of american keyboard...
> > 
> > What version of Windows are you using?  How did you set up your
> > keyboard?  (Did it just work in Norwegian "out of the box"?)
> 
> I'm using Win NT 4.0 (1381). I have tried four alternatives for the
> keyboard settings:
> 
> Short   : Input Locale : Keyboard Layout
>  EN (1) : English(US)  : US
>  NO (1) : Norwegian(Bokmal) : Norwegian
>  EN (2) : English(US)  : Norwegian
>  NO (2) : Norwegian(Bokmal) : US
> 
> When I in a DOS window type the keys from j moving towards the return
> key I get the following using the different settings:
> EN(1) : jkl;'\JKL:"|
> NO(1) : jkløæ'JKLØÆ* 
> EN(2) : jkløæ'JKLØÆ* 
> NO(2) : jkl;'\JKL:"|
> 
> Typing the same at the prompt asking for a URL (after pressing g in
> lynx):
> EN(1): jkl`'/JKL~"? 
> NO(1): jkl;'\JKL:"| 
> EN(2): jkl;'\JKL:"| 
> NO(2): jkl`'/JKL~"?
> 
> As one sees it seems like both DOS and lynx seem to ignore the locale
> setting and use only the keyboard layout setting. (This since NO(1) and
> EN(2) are identical and the same with EN(1) and NO(2))

Yes, that much seems clear.

(Note that we are not reaaly talking about DOS, but an NT console
application - I don't know how much difference that makes.)

It's interesting that even two of the EN(1) keys differ.
The EN(1) without Lynx follows the "normal American" keyboard layout
(I assume that that's what I have here), except that your keyboard
has one more key in that row.

> In addition the keys to left of n and o produces the following with
> lynx:
> EN(1): nm,./NM<>?op[]OP{} 
> NO(1): nm,.-NM<>_op];OP}:
> 
> The English version is OK, but the Norwegian version should have
> produced:
>  nm,.-NM;:_opå¨OPÅ^
> (where the ¨(diaresis) and the ^(caret) is deadkeys...)
> 
> > I am using Windows 95, and in the Keyboard control panel, the language
> > "English (United States)" is associated with keyboard layout "United
> > States-Dvorak".  While the DOS shell does not pick that up, Lynx is
> > definitely aware that I'm using a Dvorak keyboard.
> 
> Lynx seems to be aware that there is a different keyboard settings but
> it does not change it correctly as shown in my examples.

None of your examples with Lynx have any 8-bit (i.e. non-ASCII) characters.
Maybe the curses doesn't even allow those through unmangled.
Is there any way you can get any of the non-ASCII characters, maybe with
<ALT>+N N N or <ALT>+0 N N N?

Also what happens if you try those characters while within Lynx, but at
a non-curses (non-fullscreen) prompt?  For example, if you do not set
TERM= before lynx is invoked or set it to sumthing like "dumb", you should
get the
  Enter a terminal type: [vt100] 
prompt, which you could abuse for testing.

> > You can't really switch fonts in DOS (well, you can sort of, but not
> > character sets); you're stuck with the IBM code page for your locale.
> > What did you set your "display (C)haracter set" to in the (o)ptions
> > page?  I'm using "IBM PC character set", and all of the lower-case
> > accented characters are displayed correctly.  &Oslash; and &oslash;,
> > unfortunately, are not part of the US IBM character set.  Lynx
> > approximates &oslash; with a phi, and &Oslash; as a capital O.
> 
> I've tried "IBM PC character set" and "IBM PC Codepage 850" and "ISO
> Latin 1" and the MS Windows CP... The IBM PC character set is the one
> which most accurate. It got &oslash; (ø) although it's not the nicest
> character I've seen (the baseline is to high, it should be seated on the
> baseline like the 'o').
> 
> I tried display the following line in the different charsets:
> æøåÆØÅ õëàäá

Ok, this is a dfferent topic now, not about keyboard input.
I assume you put those character in a file in some way, and then used
Lynx to view that file.

This is not a valid test if your editor (or whatever you used to
create that file) uses a different character set for the 8-bit
characters than what Lynx assumes.  Lynx assumes by default
"iso-8859-1", and you can change it with -assume_charset or (for local
files) -assume_local_charset, or the equivalent settings in the
lynx.cfg file.

In this case, your test file seems to be in iso-8859-1 encoding.  You
have probably created it with a Windows application which uses
Microsoft's CP 1252 (a superset of iso-8859-1).  If that is the case,
then your test is essentially valid.  But to make sure, try also
the test file at
<http://www.tezcat.com/~kweide/lynx-chartrans/test/iso8859-1.L1html>.
It will get sent with a "Content-Type: text/html;charset=iso-8859-1"
HTTP header, so that Lynx's interpretation of the text won't depend
on any of your local settings.

Does the outcome of these tests depend in any way on the locale setting?

> That is in html entities (and descriptive):
> aelig, oslash, aring, AElig, Oslash, Aring, o with tilde, diaresis e, 
>    a with acute (`), a with diaresis, a with grave (´).
> In IBM PC character set : 
>   æøåÆOÅ oëàäá  where the ø really is a mathemical symbol for the
>                 empty set. A flattend o which is raised and got a slash
>                 (Maybe it's a version of the greek character phi)

Here you see Lynx working as intended - the characters are shown as
far as they are available in codepage 437, and for the three
characters not available, replacements are shown.

> In IBM PC Codepage 850 :
>   æ&cent;åÆ&yen;Å &Sigma;ëàäá  where &sigma; is the mathematical symbol
>                                used in summations

I assume you mean that the cent, yen, and Sigma characters are shown,
not the strings "&cent;" and "&yen;".

This is all consistent with the theory that the code page in effect is
"IBM PC character set", cp437.  For the three characters in question,
Lynx outputs the bytes which would be correct if the code page in effect
were cp850, but since that is not the case, they appear wrong.
Codepage 865 is very similar to cp437 except for the &Oslash and &oslash
characters (and some others not tested by you).  For &Oslash and &oslash
(not present in cp437), cp865 agrees with cp850 instead.

So the appearence of &Oslash and &oslash you observe indicates that cp865
or cp850 are not being used.

(I am relying on the correctness of the Linux kbd package here, which I use
for loading different code pages and comparing them visually.)

> In ISO latin 1:
>   Chaos!!! One line pr. character
>   æ equals µ or micro
>   ø equals ° or the degree sign
>   å equals the greek character sigma  (lowercase)
>   ÆØÅ equals various line characters
>   õ equals the lower part of an integral sign
>   ë equals the lowercase delta
>   à equals the lowercase alpha
>   ä equals the uppercase Sigma
>   á equals the lowercase beta
> 
> In other words rather difficult to read.

In this case, since input is iso-8859-1 and display output set to
iso-8859-1, Lynx essentially doesn't translate.

It may be interesting to check what appears (1) when you just TYPE the
file, and (2) when you use lynx with the -dump flag.  (With the -dump
flag you cannot specify a "Display character set" directly, but Lynx 
should use the setting last saved from the Options Screen.)

> > Does Norwegian DOS use a different codepage?  (I would hope so, for
> > the slashed O.)  Lynx probably needs some minor work to support that
> > code page.  Klaus?
> 
> Norwegian might use both 850 and 865. But both these use to function
> properly when using plain DOS and no windows.

But can you exchange data between Windows and DOS applications without
problems?  For example if you create a file with 8-bit characters with
a windows editor, and then view it with a DOS editor or just the TYPE
command, does it show the right characters?  (And similar the other way 
round.)

Again, under the given conditions (Lynx display, as reported by you)
neither 850 nor 865 are used - that would be inconsistent with what
you see for &Oslash and &oslash.  I would be surprised if this is
different with other application running in a DOS-like console window
or screen - but maybe there is a difference between pure DOS programs
and Win32 console applications.

> Doug Kaufman wrote:
> >On Thu, 31 Jul 1997, Klaus Weide wrote:
> >> Wayne had some notes on using codepage 850 (he was recommending it), but I
> >> couldn't find them now at http://www.fdisk.com/.  Anyway, that CP 850
> >> contains all the ISO-8859-1 characters.  And it has &Oslash; and &oslash;
> >> just in the positions where CP 437 ("US IBM character set") has &yen;
> >> and &cent;.  So it seems that this would be the right Display Character
> >> Set to use - at least as far as these two characters are concerned.
> >
> >Codepage 850 has all the ISO-8859-1 charcters, but doesn't have them in
> >the correct position.  The codepage for ISO-8859-1 is CP 819.  At least
> >in the United States, it doesn't come with MSDOS.  You can get it at
> >"ftp://ftp.uni-erlangen.de/pub/doc/ISO/charsets/isocp101.zip";.
> 
> I haven't tried the 819 codepage. But I do believe that in plain DOS
> this is _not_ compatible with the Norwegian keyboard drivers... :(
> Whether this actually functions within Windows I do not know, but I
> would very much like to use if it does... 
> 
> Hope this gives you some more hints to follow when trying to solve the
> problem. And I really hope that all the characters survive through the
> mail system. It ought to do so since I'm using MIME... At least it
> should do so if you display iso-8859-1 properly... :-)

Yes, I think I can see all the characters you mean as they are intended.
Great thing when it works, exchange of correctly labelled data. :)


     Klaus

;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]