lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Re: lynx should respect LANG


From: Klaus Weide
Subject: Re: lynx-dev Re: lynx should respect LANG
Date: Tue, 30 May 2000 13:09:52 -0500 (CDT)

On Tue, 30 May 2000, Henry Nelson wrote:

> > > > TEXTDOMAINDIR and TEXTDOMAIN are environment variables used by
> [...]
> > Actually, now that I have tried it - you must be talking about some
> > other programs [1], not lynx. Or at least, not lynx at runtime (maybe
> 
> Seems I owe you and Mr. Kohda a deep apology. Those TEXTDOMAIN*
> variables are a mystery to me, too. Some relic from some past age I
> suppose.
> 
> > from mine; and I cannot see how that could be: the text domain and
> > associated directory are hardcoded in LYMain.c, as
> 
> Hardcoded means it can be configured. Is LOCALEDIR what gets set with
> the configure option --with-nls-datadir?

Yes, it seems so.  Plus or minus some path components, "/locale" seems
to always be forced in at the end, it's not easy to understand what
./configure does exactly.

> I used to hand edit makefile before the configure option.

That would explain it, at least for the TEXTDOMAINDIR (but probably
not TEXTDOMAIN == "lynx") part.

> As to why my lynx might behave differently
> from yours, perhaps those ifdef's make a difference.

Possible, but I was assuming that HAVE_LIBINTL_H must always be defined,
in order for GNU gettext to work at all.

> I configure with
> --disable-included-msgs and --without-included-gettext, if it makes a
> difference.

I always use --without-included-gettext (whether explicitly or implicitly).
(Why would I want to use an outdated gettext library implementation??
Anything older than 0.10.35 should not be used IMO.
intl/VERSION still says:
GNU gettext library from gettext-0.10.32)

I tried to understand what --disable-included-msgs does (I have use both),
it appears that it doesn't influence the location where the message catalogs
are sought.

> BTW, does Lynx do anything with NLSPATH?

Not afaics, if you use gettext all the way.  But the man page says:

       NLSPATH             This  variable, if set, is used as the
                           path prefix for message catalogs.

This seems to be wrong.  I can't see how it can be right as long as
'bindtextdomain ("lynx", LOCALEDIR)' is hardwired, when using GNU
gettext.

NLSPATH appears in the intl/ sources, but only in the file intl/cat-compat.c.
It seems that it may be used when the underlying message catalog
implementation is catgets.  But even then, it seems that when using
that via GNU gettext's cat-compat.c, the fixed LOCALEDIR from the
bindtextdomain call will override that or at least have precedence.
A new NLSPATH env variable is constructed in intl/cat-compat.c's
bindtextdomain(), which combines the new dirname with any pre-existing
NLSPATH.


> > > > I disagree that LANG should be the major criterion for deciding
> > > > the message catalogue to use.
> 
> The keyword is "major."

I then take your "should" in the sense of a 'best practice' suggestion
for system administrators; right?

But on most (esp. single user) systems, nobody will bother to set up
different TEXTDOMAINs for different uses.  I maintain that "$LANG" is
the normal mechanism, and normally the only one used by 'standard'
installations.


> > Actually, according to my new findings, the L* env variables are the
> > only way to select among catalogues at runtime - unless you resort to
> > shuffling files or symlinks around, or modify the lynx code.
> >
> > Can you explain the difference?
> 
> No. Probably some mistaken notion I had; I've been away from gettext
> for over a year. I do have two separate translations which are both
> being used. I thought I was switching between them with those TEXTDOMAIN*
> variables, but it must be because I compiled two versions of lynx, one
> with the default GNU path (/usr/local/share, I _think_), and one in
> my private account using the nls-datadir configure option.

That would explain it...

> Maybe it's because I use GNU libintl (see below)?

It seems we all do, except for Tom trying to support NLS with Sun's
gettext...  (is anyone actually using that in practice?)

> > [1] Actually, the gettext *program* (utility for shell scripts)
> > observes those environment variables. BUt we were talking about
> 
> Maybe setting those variables was convenient at some past time for the
> maintenance of the translation catalogue?

Or you were experimenting with Sun's gettext at some point, which
behaves differently?

> > For messages, GNU gettext adds a non-standard extension, the LANGUAGE
> > variable, so the precedence becomes
> >
> >      LANGUAGE > LC_ALL > LC_MESSAGES > LANG
> >
> > Try it - you *should* see this behavior in your lynx, as far as
> > message
> 
> You are correct. Since I use the GNU version, I can install two versions
> of translations for every program, and then use LANGUAGE and LANG to
> distinguish between systemwide and personal preferences to translations,
> although not the intended use perhaps.
> 
> > You do things differently - I am curious, why? Did some documentation
> > suggest that you use (non-standard) LANGUAGE rather than LANG?
> 
> Well, probably mistaken, but I look at LANG as SJIS or EUC, i.e., some
> character set, and I look at LANGUAGE as Japanese, i.e., some language.

Makes sense to adopt "looking at" things this way as your personal
policy.  But it isn't required - both LANG and LANGUAGE can take the
same kinds of strings (like "ja_JP", "ja_JP.eucJP" or "ja_JP.ujis")
which may or may not explicitly mention a character encoding.
It's perhaps a shortcoming of the whole "locale" idea that "character
set" and "language" issues are inseparably mixed together like that.

> > (and do it correctly). Realistically, who is going to write them, for
> > all those new users on new systems?
> 
> "All those new users." Why take all the fun out of Un*x?

Yeah, but if we can help them to *get started* without first having to 
customize so much, then maybe we should...

> > > I can set LANG to pretty much whatever I want, and Lynx still works.
> > > If I set my terminal emulator to the wrong kanji encoding then I'm
> > > in real trouble. But if it's worth it to you, go ahead.
> >
> > It appears (from your other messages) that you are setting either
> > LANGUAGE or LC_MESSAGES. Both have precedence over LANG. So no
> > surprise if LANG has no effect on message catalogue selection.
> 
> Actually, I wasn't talking about message catalogue selection here. I
> was talking about the display character set. (That's what I thought
> this thread was about originally. The NLS stuff came out later in the
> thread, I believe.)

Yes, we were originally talking about the DCS, but your statement "I can
set LANG to pretty much whatever I want, and Lynx still works" seemed
more general.

> I can set DCS to either EUC or SJIS, and Lynx will
> [just] work, *IF* my terminal emulator is set to receive/send SJIS.

It really shouldn't, if DCS says EUC but the terminal emulator is
set to receive SJIS...  I'd guess that the "terminal emulator" (taking
this to include everything 'in front of' lynx, including fonts) isn't
set up to receive SJIS then, after all.  (Maybe it can magically detect
what character encoding it gets.)

> The
> only fatal combination is to have the DCS set to SJIS and the terminal
> emulator set to EUC.

And setting DCS automatically based on LANG (*if* you chose to enable
this feature) would prevent that from happening - if you set LANG
correctly, as you say you do, below.

> Total speculation, but maybe this is why there's
> all the barking about PC Un*x: the environment is a console that can
> only do EUC, thus the absolute necessity to have DCS match LANG=EUC.

What do you mean by barking?  I don't recognize the expression.

Of course, people still log in to "PC Un*x" machines with Windows telnet
clients...

> > You regard it as an advantage that lynx doesn't react to $LANG at all.
> > Because you can't mess up lynx, no matter how wrong you set LANG     .
> 
> That is not the case at all. Lynx, quite frankly, is the least of my
> worries. Paramount is that my terminal or console match LANG, whatever
> happens after that is no big deal; I can handle it; I can fix it; I can
> make it right if it's wrong. Get LANG out of wack in relation to how you
> log in, and you can't do anything, period.

So you already take great pains to set LANG appropriately...  then it
shouldn't hurt you at all if LANG controls some more things (*if* you
wanted to enable this feature at all), since it will be correct.

> > Let me try an analogy: lynx reacts to the $TERM anvironment variable.
> > You *can* scree up lynx by giving TERM the wrong value (i.e. wrong for
> 
> Sure. You could also give lynx the "right" $TERM anvironment variable
> and screw lynx up to no end.

Then either it isn't reall right, or lynx is broken...

> I'd rather work on getting the curses
> library, in my case Slang, (from what Hataguti implies, I've been doing
> it wrong)

YM, you still have to refresh the screen, while you shouldn't have
to if slang was really compiled with KANJI support?

> compiled and installed correctly and the correct terminfo/termcap
> description written _first_, then give TERM the right value.
> 
> Anyway, you are right on all accounts. All I'm saying is, "why do we
> need all this automation?". How is anyone going to be able to anything
> on their own if they're spoon fed all the time? We've got a setting in
> lynx.cfg; we've got an option in the O)ptions menu (.lynxrc); we can
> edit userdefs.h and compile in the DCS. IMHO, that's enough.

Well, I'm concerned that lynx is becoming nannyware, too.
Where were your protests when "all this automation" was added for
temp subdirectories or setuid, undocumented and no easy way to turn
it off?

View my suggestion as a defensive move, if you like...  Dependence on
$LANG (etc.) is creeping in anyway, whether you like it or not -
message selection depends on it (unless you override it), case mapping
already may depend on it (although for Japanese, that is irrelevant),
error messages from system calls may already be translated to "your
language" by strerror() and friends (although you'll probably only see
those in trace logs), and see Tom's recent move for a UTF-8 aware
ncurses.  And selecting DCS based on LANG is a requested feature - not
surprising because other programs behave that way.  So I suggested
implementing it in a way that can easily be turned of (has to be
explicitly enabled, in fact).

It doesn't prevent people from editing lynx.cfg or using the 'O'ptions
screen - it doesn't even discourage it.  The effective settings
resulting from $LANG (etc.) would be visible on the 'O'ptions menu
(as far as they can be set there at all), and can be changed there.

I think this is better than delegating this to some wrapper script
(which, realistically, most users won't write themselves, so they will
depend on something like a Debian lynx-wrapper package, which probably
will be *harder* (at least, less obvious) to circumvent once a user
decides to "[do] anything on their own". 

Btw., I expect the feature to be also useful for advanced users.
As a convenience, not that it does anything that isn't already
possible by modifying configuration files, but I (at least) don't
always want to bother.  For example, (with lynx.cfg set appropriately
to enable the feature, and possibly override .lynxrc if necessary)

    LANG=ja_JP.ujis lynx -dump http://www.debian.org > debian.euc.html
    LANG=ja_JP.sjis lynx -dump http://www.debian.org > debian.sjis.html

  - Get two copies of a page, in different character encodings.


    LANG=ja_JP.ujis lynx -dump ~/saved-page.txt \
         -assume_local_charset=shift_jis > translated-page.txt

  - Convert a local file (w/ lynx instead of recode, iconv, or similar)


But so far this is all vapor anyway, since no code has magically
appeared. :)

   Klaus


; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]