lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Re: lynx should respect LANG


From: Klaus Weide
Subject: Re: lynx-dev Re: lynx should respect LANG
Date: Sun, 28 May 2000 17:15:31 -0500 (CDT)

On Sun, 28 May 2000, Hataguchi Takeshi wrote:

> On Tue, 23 May 2000, Klaus Weide wrote:
> 
> > So, here is a concrete proposal, in the form of a draft for a lynx.cfg
> > option.  (If deemed necessary, equivalent userdefs.h and/or command line
> > options could also be provided.  But not .lynxrc/'O'ptions screen, it
> > should become obvious that that makes little sense and/or gets too
> > complicated to do in a reasonable way.)
> > 
> > # If USE_ENV_FOR_LOCALE is set to either TRUE or OVERRIDE, lynx processes
> > # several environment variables on startup to determine default values
> > # for several options that typically depend on the user's language and
> > # character set environment (locale).  Affected options, listed together
> > # with the environment variables in order of priority:
> > #   CHARACTER_SET          MM_CHARSET, LC_ALL, LC_CTYPE, LANG
> > #   PREFERRED_CHARSET      MM_CHARSET, LC_ALL, LC_CTYPE, LANG
> > #   ASSUME_LOCAL_CHARSET   LC_ALL, LC_CTYPE, LANG
> > #   PREFERRED_LANGUAGE     LC_ALL, LC_CTYPE, LANG
> > # Note that this mechanism doesn't depend on locale support by the OS or
> > # libraries; lynx only looks at the values for those variables as strings
> > # and uses heuristics for finding the charset and language that corresponds
> > # to a given locale name.  For display character set aspects, MM_CHARSET
> > # (a convention inherited from the metamail program) is preferred and takes
> > # precedence since it directly contains a charset in MIME form.
> 
> metamail's man sais:
> 
> |        MM_CHARSET
> |                If  this variable  is  set,  it will suppress the
> |                printing of character set declarations  when  mail
> |                headers being printed contain text in this charac-
> |                ter set. For example, if you  set  MM_CHARSET  to
> |                "iso-8859-8",   it  will suppress  warnings  when
> |                header output is produced in that character set.
> 
> I think this variable should be set to a charset of recieved mail.  I
> doubt this sould affect CHARACTER_SET of Lynx, because CHARACTER_SET
> should be set to a charset which can be handled by the terminal.

No, it is not being set *by* metamail; it is being set by the user
*for* metamail.  It tells metamail: this is is the charset that
the terminal understands.  Exactly the same meaning as CHARACTER_SET
for lynx, although the description you quote may be confusing.

The connection between matamail's MM_CHARSET and ";charset=..." in
mail handled by metamail is:
  IF   $MM_CHARSET == <mail's charset)
  THEN display message directly, without warnings


> In case of Japanese, we usually encode our mail by JIS (iso-200-jp).
                                                          YM: 2022
> But now Lynx can't display in JIS but only in EUC (euc-jp) and SJIS
> (Shift_JIS).

So (if you wanted to use this proposed mechanism with lynx, *and* wanted
to use its MM_CHARSET part) you would set MM_CHARSET=euc-jp or
MM_CHARSET=shift_jis.  That's not useful for metamail, unless you actually
receive mail in euc-jp or shift_jis, but it shouldn't so any harm either.
(After all, it's a truthful statement about the terminal's behavior.)
Actually, it can be useful even for metamail if you write some customized
mailcap entries (that would check %{charset} and $MM_CHARSET and spawn
an appropriate coverter, I imagine).

So the right way to set MM_CHARSET would be, if you are for example using
kterm:
 MM_CHARSET=euc-jp     in   kterm -km euc
 MM_CHARSET=shift_jis  in   kterm -km sjis
Actually kterm also understands ISO 2022 encoding (independent of -km
mode, it seems), so setting MM_CHARSET=iso-2022-jp would also be
valid within kterm, and that could actually be directly useful to
metamail.  If lynx with USE_ENV_FOR_LOCALE sees this, I guess it should
ignore it (as it should any not-recognized value).  Alternatively, it
could treat it as a synonym for euc-jp.

But normally, I imagine you probably just wouldn't set MM_CHARSET as a
Japanese user.

> > # Settings for the listed options that can be derived from environment
> > # variables override builtin defaults and values in lynx.cfg files.  If
> > # USE_ENV_FOR_LOCALE is set to OVERRIDE, they also override corresponding
> > # options from the user's .lynxrc file.
> 
> Can it override command line option "-assume_local_charset" or not?

I think that explicit command line flags should always take precedence.

> > # convention.  In slightly more complicated situations, for example when
> > # the terminal display character set differs from the charset used for
> > # files (e.g., because the user is remotely logged in in a heterogeneous
> > # environment), using individual lynx.cfg and .lynxrc options affords more
> > # detailed control; USE_ENV_FOR_LOCALE should be set to FALSE to ignore
> > # environment variables in that case.
> 
> I think "more complicated situations" are rather general in Japan.

Still, there must be users (the "simple situation") that don't have
to deal with such problems.  Like a newly-installed Linux system,
where all Japanese documents use the same character encoding (EUC),
or maybe EUC and JIS.  Where no tainting by Microsoft's encoding has
occurred yet...

> For example, in my home directory, some files are encoded by EUC and
> others by SJIS or JIS.  New files which I make are usually encode by
> EUC but I also have files which I recieve from someone and get from
> other site by ftp and http. They may be encoded by SJIS or JIS.
> 
> If LC_ALL, LC_CTYPE or LANG is set to ja*, I want ASSUME_LOCAL_CHARSET
> isn't set to euc-jp nor Shift_JIS but set to "iso-8859-1", otherwise
> the auto detect routine for Japanese is disabled.

Four possible answers; pick one or more. :)

1.) As I wrote,
| The list of options affected is of course up for discussion.

2.) If you have to deal with an inhomogeneous environment (with respect
to cgharacter encodings), then this option is not for you.

3.) Why don't you recode those files after receiving them?
That should make your life a lot easier for working with all kinds
of tools, not just lynx.

4.) If, for making best use of lynx, you have to set ASSUME_LOCAL_CHARSET
to "iso-8859-1" (rather than leaving it unset) - then that's a design bug 
in lynx.  Saying that your files are in iso-8859-1 when that's the last
thing they are is clearly not the right way to go.

Maybe introduced by your changes to 2.8.3?
[ I think the way lynx is now honoring ASSUME_CHARSET for Japanese is
not good.  Rather than adding special-case handling for Japanese that
is completely different from that for all other character sets - which
you have done - handling for Japanese character sets should be *more*
in line with that for other character sets.
I have private code changes (rather extensive) where I try to do that, 
started a while ago, but it doesn't yet handle all cases for Japanese
reasonably.  Still, if you are planning to make more changes in this
area, please have a look at my code first.  Ask and I'll make a current
snapshot available. ]

> # Or "x-autodetect_jp" may be better, though it's not valid now.

Maybe something like it should exist.  But as an ASSUME_(LOCAL_)_CHARSET,
not as a (display) CHARACTER_SET.

   Klaus


; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]