Re: lynx-dev Re: lynx should respect LANG

lynx-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Re: lynx should respect LANG

From:	Klaus Weide
Subject:	Re: lynx-dev Re: lynx should respect LANG
Date:	Sun, 4 Jun 2000 19:49:13 -0500 (CDT)

On Sat, 3 Jun 2000, Hataguchi Takeshi wrote:

> On Sun, 28 May 2000, Klaus Weide wrote:
> [snip]
> > So (if you wanted to use this proposed mechanism with lynx, *and* wanted
> > to use its MM_CHARSET part) you would set MM_CHARSET=euc-jp or
> > MM_CHARSET=shift_jis.  That's not useful for metamail, unless you actually
> > receive mail in euc-jp or shift_jis, but it shouldn't so any harm either.
> > (After all, it's a truthful statement about the terminal's behavior.)
> 
> I don't think the warning of metamail is harmful.

The warning itself isn't harmful - but ignoring it might be. :)
For example, if you got a mail in shift_jis (and properly labelled so,
so than metamail _can_ warn).  Now I think that even mail programs on
Japanese Windows systems don't send mail that way, so this may not be
a very realistic example, but what about other (non-Japanese) charsets
with Shift_JIS-like characteristics (here: characters in the 128-159
range).  According to Henry, that may lead to a "total lockup of
emulation" in some circumstances.

> But it can be suppressed.

Well, the warning attempts to warn about something, some condition that
may be fixable.  This shouldn't be viewed as a question of just "How can
I suppress the warning", but a combination of "How can I suppress the warning
when it is unnecessary" and "What can I do to display text correctly when the
warning *is* appropriate".

See the shownonascii script that comes with metamail, which tries to spawn
an xterm with the required font if necessary.  You should also find some
example mailcap entries that use it.  Of course this works only under X,
and it has (in my version, at least) 'xterm' hardwired in while you probably
want something else like 'kterm' for Japanese.  But the script could be
customized and extended.

> > Actually kterm also understands ISO 2022 encoding (independent of -km
> > mode, it seems), so setting MM_CHARSET=iso-2022-jp would also be
> > valid within kterm, and that could actually be directly useful to
> > metamail.  
> 
> If kterm couldn't handle iso-2022-jp, I would use a filter to convert
> to an appropriate codeset.  

Is it true that all terminal emulators that understand euc-jp also
understand iso-2022-jp directly (without the user having to switch modes)?
I checked kterm, kon, and krxvt, they all seem to, at least for JISX0208
characters.

Also, just out of curiosity, is kterm what you normally use?

> This doesn't concerned with the value of MM_CHARSET.  

But MM_CHARSET would be a way to tell metamail when to invoke that filter
(or different filters for different mail charsets) and when not.

> I like setting MM_CHARSET to iso-2022-jp because metamail
> wouldn't show any warnings.
> 
> > If lynx with USE_ENV_FOR_LOCALE sees this, I guess it should
> > ignore it (as it should any not-recognized value).  Alternatively, it
> > could treat it as a synonym for euc-jp.
> 
> iso-2022-jp shouldn't be a synonym for euc-jp. So Lynx should ignore
> iso-2022-jp in this case.

It used to be, for all practical purposes that I can see, before your
changes that are now in 2.8.3.  You have effectively disabled recognition
of "charset=ISO-2022-JP" and "charset=ISO-2022-JP-2", which is not so
good.  Lynx *should* recognize documents with such an explicit charset
as Japanese.  I thing the previous behavior, although not the most
correct, was better; I have changed this in my code.

I have copied some of your test files of a while ago to
  <http://www.enteract.com/~kweide/test/TH/>,
and added two files for the same characters in iso-2022-jp encoding.

(Yes, I am aware that that's mostly about the input (document) side, while
we were talking about the output (display) side.)

> > But normally, I imagine you probably just wouldn't set MM_CHARSET as a
> > Japanese user.
> 
> Why?  I watnt to recommend Japanes metamail user to set MM_CHARSET to
> iso-2022-jp not to show warnings of metamail.

Okay, I didn't consider that properly.  So using MM_CHARSET in the way
I proposed may not be such a good idea.  At least for Japanese...
I still think it makes sense for everything but Japanese charsets.

> But please note when we open a Japnese file with mule (Multilingual
> Enhancement to GNU Emacs), we don't have to tell the charset to mule
> [...]
> I can say almost same thing about nkf (Network Kanji filter).  I think
> detect routine of Lynx for Japanese may be as good as those of mule
> and nkf.

There are still situation where autodetection fails, and where it helps
to tell the program what the input is.  I think you agree that that's
not just theoretical, otherwise you wouldn't have added the changes
for -assume_charset handling.

[ lots of snipping ]
> > On Sun, 28 May 2000, Hataguchi Takeshi wrote:
> > > For example, in my home directory, some files are encoded by EUC and
> > > others by SJIS or JIS.  [...]
> > > If LC_ALL, LC_CTYPE or LANG is set to ja*, I want ASSUME_LOCAL_CHARSET
> > > isn't set to euc-jp nor Shift_JIS but set to "iso-8859-1", otherwise
> > > the auto detect routine for Japanese is disabled.
> 
> I'm sorry. This isn't correct now for ASSUME_LOCAL_CHARSET.
> 
> ASSUME_LOCAL_CHARSET probably has no (or almost no) effect when DCS is
> set to euc-jp or Shift_JIS, whereas ASSUME_CHARSET has some effect as
> I wrote above.
> # This also may be a design bug.
> 
> > Four possible answers; pick one or more. :) [...]
> 
> I'm sorry this choice may have no sence because ASSUME_LOCAL_CHARSET
> probably has no effect when DCS is set to euc-jp or Shift_JIS.

It should have an effect, for consistency with non-Japanese (non-CJK?)
character environments.

> > > # Or "x-autodetect_jp" may be better, though it's not valid now.
> > 
> > Maybe something like it should exist.  But as an ASSUME_(LOCAL_)_CHARSET,
> > not as a (display) CHARACTER_SET.
> 
> Yes.  The absence of it cause inconsistency of coding around
> ASSUME_CHARSET.  Unfortunately I couldn't introduce it in my past
> changes.

And then there's -raw ('@') and how that should interact with all
of the rest.

Do you, and other Japanese users, actually toggle '@' when you visit
non-Japanese (non-CJK) sites?  Do you know you're supposed to?
Or do you just not care enough about anything but Japanese and 7-bit
ASCII characters?

   Klaus

; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden

[Prev in Thread]

Current Thread

[Next in Thread]

Re: lynx-dev Re: lynx should respect LANG, Henry Nelson, 2000/06/01
- Re: lynx-dev Re: lynx should respect LANG, Klaus Weide, 2000/06/01
- Re: lynx-dev Re: lynx should respect LANG, Klaus Weide, 2000/06/01
- Re: lynx-dev Re: lynx should respect LANG, Atsuhito Kohda, 2000/06/01
  - Re: lynx-dev Re: lynx should respect LANG, Klaus Weide, 2000/06/01
    - Re: lynx-dev Re: lynx should respect LANG, Atsuhito Kohda, 2000/06/02
- Re: lynx-dev Re: lynx should respect LANG, Hataguchi Takeshi, 2000/06/03
  - Re: lynx-dev Re: lynx should respect LANG, Klaus Weide <=
    - Re: lynx-dev Re: lynx should respect LANG, Hataguchi Takeshi, 2000/06/07
- Re: lynx-dev Re: lynx should respect LANG, Henry Nelson, 2000/06/04
  - Re: lynx-dev Re: lynx should respect LANG, Klaus Weide, 2000/06/05
- Re: lynx-dev Re: lynx should respect LANG, Henry Nelson, 2000/06/05
  - Re: lynx-dev Re: lynx should respect LANG, Klaus Weide, 2000/06/06
- Re: lynx-dev Re: lynx should respect LANG, Henry Nelson, 2000/06/06

Prev by Date: lynx-dev Latest development lynx.exe posted to alt.binaries.misc
Next by Date: Re: lynx-dev Concerning the use of abort in the source of lynx
Previous by thread: Re: lynx-dev Re: lynx should respect LANG
Next by thread: Re: lynx-dev Re: lynx should respect LANG
Index(es):
- Date
- Thread