Re: lynx-dev Re: lynx should respect LANG

lynx-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Re: lynx should respect LANG

From:	Klaus Weide
Subject:	Re: lynx-dev Re: lynx should respect LANG
Date:	Mon, 5 Jun 2000 14:22:14 -0500 (CDT)

[ attributions lost ]
> > > I don't think the warning of metamail is harmful.
> > 
> > The warning itself isn't harmful - but ignoring it might be. :)
> > For example, if you got a mail in shift_jis (and properly labelled so,
> > so than metamail _can_ warn).  Now I think that even mail programs on
> > Japanese Windows systems don't send mail that way, so this may not be
> > a very realistic example, but what about other (non-Japanese) charsets
> > with Shift_JIS-like characteristics (here: characters in the 128-159
> > range).  According to Henry, that may lead to a "total lockup of
> > emulation" in some circumstances.
> 
On Mon, 5 Jun 2000, Henry Nelson wrote:
> [Wouldn't quote anything I say.]  If I got a mail in sjis while my emulation
> was set to receive/transmit euc, I would indeed be in trouble.  (Wouldn't

Does such mail occur in practice?

> even be able to read any warning from metamail, would I?)  That's why I have
> nkf as a (procmail) filter to turn all mail into iso-2022-jp on un*x.  I'm
> sure there's a better way, but that's all I know how to do.

> > Is it true that all terminal emulators that understand euc-jp also
> > understand iso-2022-jp directly (without the user having to switch modes)?
> > I checked kterm, kon, and krxvt, they all seem to, at least for JISX0208
> 
> The only (telnet-like) terminal emulators I know only do euc-jp or sjis.

If your your mail (always in iso-2022-jp after conversion, as you've
described above) , when shown directly with cat or similar, shows up
correctly, then your terminal emulator[*] does understand iso-2022-jp.
Is that not the case?

[*] Well that or something in between the terminal emulation and the
shell's standard output.

> > Also, just out of curiosity, is kterm what you normally use?
> 
> I know "you" doesn't mean me, but JFYI, I only use text, i.e., telnet in.

"Telnet in" from what?  If it's a Windows telnet client, then the terminal
emulation is probably part of it.  In UNIX, telnet clients are separate
things from terminal emulators.

> Occasionally, I'll use the console.  Never used X.

> I don't know about the reason, but mail should be received/sent in one
> encoding no matter what; for that iso-2022-jp seems the "best" standard.

That seems to be the agreed-upon standard for Japanese e-mail, to use
ISO 2022 based encoding for interchange (but not necessarily for other
CJK languages).  But I was wondering how often you encounter exceptions
in practice.

> > There are still situation where autodetection fails, and where it helps
> > to tell the program what the input is.  I think you agree that that's
> 
> From a strictly user's point of view, when, and only when, autodetection
> fails do I want to go to trying "to tell the program what the input is."
> After autodetect fails, first, I want Lynx to try to go by what the
> document is labled as.

THe actual logic (in lynx) seems to be that if the document is labelled (by
the author or Web site) as a specific charset, then that takes precedence
over guessing ('autodetection').  That's how I understand the changes that
are now in 2.8.3, although right now I'm not sure this is always the case.

I also think this is how it should be, do you disagree?
If Web authors explicitly lie about their documents' charset, lynx shouldn't
be expected to correct for it.  This should apply for Japanese the same way
as it does for other charsets.

> Finally, I want to be able to do a manual override.

Do you do that now, with lynx?  I'm curious, how often does that situation
come up?

> What's actually there in the document seems much more reliable than what
> someone may say about what's in there.  (If totally off base, please ignore.)
> 
> > Do you, and other Japanese users, actually toggle '@' when you visit
> > non-Japanese (non-CJK) sites?  Do you know you're supposed to?
> 
> (Only talking about "you"=me.) No. No.  (But do tell me why I'm supposed to.)

To answer the "why":  So you can see the characters correctly.
Well not exactly the characters (which may not be in your Japanese
fonts), but lynx's 7-bit replacements for them, "(c)" for &copy;,
"e" for &eacute;, and so on.

This isn't necessary if the document is actually labelled as a non-Japanese
charset (like, *for example*, iso-8859-1), and it isn't necessary either if
the characters are given as entities (as shown above) or as NCRs (like
&#169;).  But it is necessary for non-Japanese documents that don't have
an explicit charset label and contain (probably) iso-8859-1 characters
in raw byte form.

As a demonstration, make a copy of the test/iso8859-1.html file, and remove
the <META HTTP-EQUIV=...>.  You should find that toggling raw is necessary
for characters in the first column.  I am assuming here that ASSUME_CHARSET
and ASSUME_LOCAL_CHARSET are not set.

> > Or do you just not care enough about anything but Japanese and 7-bit
> > ASCII characters?
> 
> Yes.  (Can't read anything except Japanese and English.  Smattering of
> French and Spanish, so, yes, I'd like to see some of the Western European
> multi-byte characters, 

Don't think of them as "multi-byte"...
Actually, characters, as an abstract concept, are neither single-byte
nor multi-byte, they just are.  In a specific encoding, they may take
up one or more bytes, but those "Western European" non-ASCII characters
are single-byte in the encoding you are most likely to encounter them
on the Web (iso-8859-1 or similar).  But actually multi-byte in UTF-8
character encoding.

> but that has to do with the code set on my PC
> and the encodings my terminal emulation can do, not lynx, AFACS.  Why do
> I say this; well, when it comes time after a long pause to display some
> Japanese, my hard-disk on my PC spins up, to read the fonts on my PC I
> assume.

I am not sure what you mean here.  Are you talking about viewing a
copyright sign character as an actual copyright sign glyph in your
font?  How do you do that in your Japanese environment?

Anyway, I didn't really mean that, since the current lynx doesn't do
anything to map non-Japanese-encoded "Western European" characters to
Japanese-encoded ones.  I just had in mind what's necessary to make
lynx do the best it currently can, which is use 7-bit replacements
in those cases.

      Klaus

; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden

[Prev in Thread]

Current Thread

[Next in Thread]

Re: lynx-dev Re: lynx should respect LANG, Henry Nelson, 2000/06/01
- Re: lynx-dev Re: lynx should respect LANG, Klaus Weide, 2000/06/01
- Re: lynx-dev Re: lynx should respect LANG, Klaus Weide, 2000/06/01
- Re: lynx-dev Re: lynx should respect LANG, Atsuhito Kohda, 2000/06/01
  - Re: lynx-dev Re: lynx should respect LANG, Klaus Weide, 2000/06/01
    - Re: lynx-dev Re: lynx should respect LANG, Atsuhito Kohda, 2000/06/02
- Re: lynx-dev Re: lynx should respect LANG, Hataguchi Takeshi, 2000/06/03
  - Re: lynx-dev Re: lynx should respect LANG, Klaus Weide, 2000/06/04
    - Re: lynx-dev Re: lynx should respect LANG, Hataguchi Takeshi, 2000/06/07
- Re: lynx-dev Re: lynx should respect LANG, Henry Nelson, 2000/06/04
  - Re: lynx-dev Re: lynx should respect LANG, Klaus Weide <=
- Re: lynx-dev Re: lynx should respect LANG, Henry Nelson, 2000/06/05
  - Re: lynx-dev Re: lynx should respect LANG, Klaus Weide, 2000/06/06
- Re: lynx-dev Re: lynx should respect LANG, Henry Nelson, 2000/06/06

Prev by Date: Re: lynx-dev Superscripts
Next by Date: Re: lynx-dev Concerning the use of abort in the source of lynx
Previous by thread: Re: lynx-dev Re: lynx should respect LANG
Next by thread: Re: lynx-dev Re: lynx should respect LANG
Index(es):
- Date
- Thread