lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Re: lynx should respect LANG


From: Hataguchi Takeshi
Subject: Re: lynx-dev Re: lynx should respect LANG
Date: Sat, 3 Jun 2000 21:17:48 +0900 (JST)

On Sun, 28 May 2000, Klaus Weide wrote:

> On Sun, 28 May 2000, Hataguchi Takeshi wrote:
[snip]
> > metamail's man sais:
> > 
> > |        MM_CHARSET
> > |                If  this variable  is  set,  it will suppress the
> > |                printing of character set declarations  when  mail
> > |                headers being printed contain text in this charac-
> > |                ter set. For example, if you  set  MM_CHARSET  to
> > |                "iso-8859-8",   it  will suppress  warnings  when
> > |                header output is produced in that character set.
> > 
> > I think this variable should be set to a charset of recieved mail.  I
> > doubt this sould affect CHARACTER_SET of Lynx, because CHARACTER_SET
> > should be set to a charset which can be handled by the terminal.
> 
> No, it is not being set *by* metamail; it is being set by the user
> *for* metamail.  It tells metamail: this is is the charset that
> the terminal understands.  Exactly the same meaning as CHARACTER_SET
> for lynx, although the description you quote may be confusing.

I know the variable should be set by the user.  I simply thought it
should be set to a charset of recieved mail for suppressing warnings
by the user.

> The connection between matamail's MM_CHARSET and ";charset=..." in
> mail handled by metamail is:
>   IF   $MM_CHARSET == <mail's charset)
>   THEN display message directly, without warnings

I see what you mean.  I've not agreed completely yet but you may be
right.

> > In case of Japanese, we usually encode our mail by JIS (iso-200-jp).
>                                                           YM: 2022
> > But now Lynx can't display in JIS but only in EUC (euc-jp) and SJIS
> > (Shift_JIS).
> 
> So (if you wanted to use this proposed mechanism with lynx, *and* wanted
> to use its MM_CHARSET part) you would set MM_CHARSET=euc-jp or
> MM_CHARSET=shift_jis.  That's not useful for metamail, unless you actually
> receive mail in euc-jp or shift_jis, but it shouldn't so any harm either.
> (After all, it's a truthful statement about the terminal's behavior.)

I don't think the warning of metamail is harmful.  But it can be
suppressed.

> So the right way to set MM_CHARSET would be, if you are for example using
> kterm:
>  MM_CHARSET=euc-jp     in   kterm -km euc
>  MM_CHARSET=shift_jis  in   kterm -km sjis
> Actually kterm also understands ISO 2022 encoding (independent of -km
> mode, it seems), so setting MM_CHARSET=iso-2022-jp would also be
> valid within kterm, and that could actually be directly useful to
> metamail.  

If kterm couldn't handle iso-2022-jp, I would use a filter to convert
to an appropriate codeset.  This doesn't concerned with the value of
MM_CHARSET.  I like setting MM_CHARSET to iso-2022-jp because metamail
wouldn't show any warnings.

> If lynx with USE_ENV_FOR_LOCALE sees this, I guess it should
> ignore it (as it should any not-recognized value).  Alternatively, it
> could treat it as a synonym for euc-jp.

iso-2022-jp shouldn't be a synonym for euc-jp. So Lynx should ignore
iso-2022-jp in this case.

> But normally, I imagine you probably just wouldn't set MM_CHARSET as a
> Japanese user.

Why?  I watnt to recommend Japanes metamail user to set MM_CHARSET to
iso-2022-jp not to show warnings of metamail.

> > > # convention.  In slightly more complicated situations, for example when
> > > # the terminal display character set differs from the charset used for
> > > # files (e.g., because the user is remotely logged in in a heterogeneous
> > > # environment), using individual lynx.cfg and .lynxrc options affords more
> > > # detailed control; USE_ENV_FOR_LOCALE should be set to FALSE to ignore
> > > # environment variables in that case.
> > 
> > I think "more complicated situations" are rather general in Japan.
> 
> Still, there must be users (the "simple situation") that don't have
> to deal with such problems.  Like a newly-installed Linux system,
> where all Japanese documents use the same character encoding (EUC),
> or maybe EUC and JIS.  Where no tainting by Microsoft's encoding has
> occurred yet...

That should be true.

But please note when we open a Japnese file with mule (Multilingual
Enhancement to GNU Emacs), we don't have to tell the charset to mule
because mule usually detect the correct charset of the file
automatically.  Of course we can tell it, but the detect routine of
mule is powerfull enough in the almost all cases.

I can say almost same thing about nkf (Network Kanji filter).  I think
detect routine of Lynx for Japanese may be as good as those of mule
and nkf.

> > For example, in my home directory, some files are encoded by EUC and
> > others by SJIS or JIS.  New files which I make are usually encode by
> > EUC but I also have files which I recieve from someone and get from
> > other site by ftp and http. They may be encoded by SJIS or JIS.
> > 
> > If LC_ALL, LC_CTYPE or LANG is set to ja*, I want ASSUME_LOCAL_CHARSET
> > isn't set to euc-jp nor Shift_JIS but set to "iso-8859-1", otherwise
> > the auto detect routine for Japanese is disabled.

I'm sorry. This isn't correct now for ASSUME_LOCAL_CHARSET.

ASSUME_LOCAL_CHARSET probably has no (or almost no) effect when DCS is
set to euc-jp or Shift_JIS, whereas ASSUME_CHARSET has some effect as
I wrote above.
# This also may be a design bug.

> Four possible answers; pick one or more. :)
> 
> 1.) As I wrote,
> | The list of options affected is of course up for discussion.
> 
> 2.) If you have to deal with an inhomogeneous environment (with respect
> to cgharacter encodings), then this option is not for you.
> 
> 3.) Why don't you recode those files after receiving them?
> That should make your life a lot easier for working with all kinds
> of tools, not just lynx.
> 
> 4.) If, for making best use of lynx, you have to set ASSUME_LOCAL_CHARSET
> to "iso-8859-1" (rather than leaving it unset) - then that's a design bug 
> in lynx.  Saying that your files are in iso-8859-1 when that's the last
> thing they are is clearly not the right way to go.
> 
> Maybe introduced by your changes to 2.8.3?

I'm sorry this choice may have no sence because ASSUME_LOCAL_CHARSET
probably has no effect when DCS is set to euc-jp or Shift_JIS.

But if it has some effect, I like 2.) or 4.).  

> [ I think the way lynx is now honoring ASSUME_CHARSET for Japanese is
> not good.  Rather than adding special-case handling for Japanese that
> is completely different from that for all other character sets - which
> you have done - handling for Japanese character sets should be *more*
> in line with that for other character sets.

I think so. I'm sorry I didn't have enough time to write a code as
expected.

> I have private code changes (rather extensive) where I try to do that, 
> started a while ago, but it doesn't yet handle all cases for Japanese
> reasonably.  Still, if you are planning to make more changes in this
> area, please have a look at my code first.  Ask and I'll make a current
> snapshot available. ]

I'm glad to hear this and I wish to do it. But unfortunately, I don't
have enough time now.

> > # Or "x-autodetect_jp" may be better, though it's not valid now.
> 
> Maybe something like it should exist.  But as an ASSUME_(LOCAL_)_CHARSET,
> not as a (display) CHARACTER_SET.

Yes.  The absence of it cause inconsistency of coding around
ASSUME_CHARSET.  Unfortunately I couldn't introduce it in my past
changes.
--
Takeshi Hataguchi
E-mail: address@hidden

; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]