lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Re: lynx should respect LANG


From: Klaus Weide
Subject: Re: lynx-dev Re: lynx should respect LANG
Date: Mon, 22 May 2000 23:25:43 -0500 (CDT)

On Tue, 23 May 2000, Atsuhito Kohda wrote:

> From: Thomas Dickey <address@hidden>
> Subject: Re: lynx-dev Re: lynx should respect LANG
> Date: Mon, 22 May 2000 19:53:17 -0400
> 
> > > - it convert lynx.cfg (with simple sed script) to lynx-ja.cfg
> > > like following;
> > > 
> > > # Include all the common options:
> > > INCLUDE:/etc/lynx-ja/lynx.cfg
> > > # Now override a couple:
> > > CHARACTER_SET:euc-jp
> > > PREFERRED_LANGUAGE:ja,en;q=0.9,*;q=0.5
> > 
> > I understand the goal
> 
> Thaks a lot, I am very glad to hear that.
> 
> > I understand the goal - but how do you propose we derive the preferred
> > language, for instance from the $LANG value?
> 
> Well, I do not know so well about the technical aspect
> of this issue but I have used the $LANG value and it seems
> to work fine for (at least) Japanese users.

That test doesn't go very far - that's just a binary decision,
(if $LANG matches ja* then do Japanese; else do English).
I assume you have tested only for one Language, and only in one
environment (Debian GNU/Linux).

Even for ($LANG matches ja*) the situation is not so simple in general.
Some ja* locales imply EUC-JP, others Shif_JIS, and maybe still others
ISO 2022.

Locale names don't map cleanly and clearly to charsets.
If you want to do this right, there is a whole lot of heuristics
involved.  Here is a function that tries to do that:
<ftp://ftp.ilog.fr/pub/Users/haible/utf8/locale_charset.c>.
Well there is also a function call nl_langinfo(CODESET) that
is supposed to do that, but I hear it's missing or not working
on many systems, and moreover lynx should be able to do it's
character handling even on systems without locale support.

> Generally speaking, Japanese users should set LANG
> to ja_JP.ujis or ja_JP.eucJP etc. to realize the basic 
> Japanese environment so I think the $LANG value might be 
> possible candidate.  I guess that the situation is similar 
> in other Asian countries.

What if Lynx is running on a Windows system.  Then the display
character set for a Japanese user should probably be Shift_JIS.
(Yes, you are probably not interested in Windows - but if we
are talking about adding some support in the source code, we have
to consider that and canot just make UNIXish assumptions.)
What if the user is telnetted in from a Windows system to a
Unix system - I assume that's not uncommon at all - then Lynx's
display character set should probably also be Shift_JIS.

> But I do not know the situation, for example, in Europe
> and so on.
> 
> Is it better to use the value of LC_* ?

That's just one of many open questions, that have to be answered
in order to do this right in general (not just for EUC-JP Japanese).

Basically, the locale concept doesn't map very well to what lynx
does, IMO.  At least, the idea of controlling a program's localized
behavior by just one (set of) environment variable(s) is not really
sufficient.  Lynx is more powerful than that; basically every
application that deals with text in more than one character encoding
is.

If LANG or, say, LC_MESSAGES, is to mean something to lynx, we'd first
have to define what it's supposed to mean -
 - does it say something about the display character set?
 - does it say something about the "preferred" HTTP charset?
 - does it say something about the charset of local text files
   (-assume_local_charset)?
 - does it say something about the default charset of remote text
   (-assume_charset)?
 - does it say something about the charset of message catalogues?
And that's only the "charset" (character encoding) aspects.
There's also "language" to consider, which, in Web protocol terms
for example, is an independent concept from character set questions.

In your wrapper, you have effectively given one possible answer to
these questions - but it is not the only possible one, and probably
not the best one.  

You have probably only thought about very simple situations - where
display character set, file character set, etc. are always the same -
but those are not the only possible ones.  And I don't think it is
very unusal that e.g. the display character set would differ from the
character set used for file storage (see example above for Windows
user -> telnet -> lynx on unix).

Since one LANG variable (or set of LC_* variables) cannot possibly cover
all these details independently, and the various aspects that can already
be configured independently for lynx, LANG should normally not override
those independent settings.

> In any way, I do not care what method to be used to determine
> preferred language but I think it is important that lynx
> has some mechanism to select automatically the preferred
> language.

But what you are suggesting (if I understand it right) isn't really
automatic.  Someone, somewhere, still has to set LANG (or LC_*?)
correctly, at least.  If that's just as system-wide thing (set up
by the administrator), then the administrator could just make an
equivalent setting in the system-wide default lynx.cfg.
To allow users to customize their environment (deviating from the
system-wide default) with this method, they have to be taught to set
LANG correctly; that's not so much different from teaching them to
set up lynx corrrectly themselves.

For situations where one can assume that $LANG has been set up in a
meaningful way, and is already used to control other progams'
behavior, it probably makes sense to derive some of lynx's relevant
settings from $LANG - at least as initial defaults.  But they will
still be wrong for some users and some situations, even if we can come
up with a meaningful answer to "What's $LANG supposed to mean".  So
I'm not sure how much of that really belongs in lynx code.

  Klaus


; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]