bug-guile
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24186: setlocale can't be localised


From: Zefram
Subject: bug#24186: setlocale can't be localised
Date: Mon, 8 Aug 2016 23:30:37 +0100

Andy Wingo wrote:
>Firstly, just to make sure that we are getting things right in 2.2 (and
>if not we need to know), would you mind testing with the latest 2.1.x
>release?

Inspection of the 2.1.3 code shows that, like 2.0, it sets the default
port encoding fluid and the encoding of the three currently-selected
ports, as a side effect of every setlocale call (both read and write).

I'm afraid I'm having difficulty compiling it.  I mostly install software
via the Debian packages, which is how I have used 1.8 and 2.0, so this
is my first time compiling a Guile myself.  It's failing on a missing
library for which Debian supplies no package.  I may sort this out later,
but right now I can't run 2.1.3.

>Also, do you have a concrete program whose behavior you expect to be
>different?

Anything I show you would be quite artificial.  Let's have a go at
getting closer to a real program.

A likely use of a temporary locale change is to format a numeric or
time string through a function that uses the currently-selected locale.
A temporary locale change would be required if the program needs to
format it in someone else's locale, or needs this string to be in a
predictable form for a particular file format regardless of user locale.
For example, strftime is such a function, and a web service might need
to format a time string for the user who made a particular request.
We can have users all over the world, so we identify each user's locale,
while the program as a whole uses either the "C" locale or the locale
of whoever is hosting the service.

    (define (call-with-locale cat val body)
      (let ((oldval #f))
        (dynamic-wind
          (lambda () (set! oldval (setlocale cat)) (setlocale cat val))
          body (lambda () (setlocale cat oldval)))))

    (define (day-of-week-string)
      (strftime "%A" (localtime (current-time)))) 

    (define (day-of-week-string-for-locale loc)
      (call-with-locale LC_TIME loc day-of-week-string))

    ;; user-locale is application-specific code defined elsewhere
    (define (day-of-week-string-for-user user)
      (day-of-week-string-for-locale (user-locale user)))

This much of the usage works fine:

scheme@(guile-user)> (day-of-week-string)
$1 = "Monday"
scheme@(guile-user)> (day-of-week-string-for-locale "de_DE")
$2 = "Montag"
scheme@(guile-user)> (day-of-week-string)
$3 = "Monday"

Observe that calling day-of-week-string-for-locale doesn't change the
prevailing locale of the program.  Thus the subsequent day-of-week-string
call uses the same locale that the first one did.  The above works
identically on Guile 1.8 and 2.0.

But things are different when we look at port encoding.  (Obviously now
we're on 2.0-specific code.)  Suppose that we have a currently-selected
input that is encoded in UTF-8.  Suppose further that this choice of
encoding is specific to this part of the application, not reflecting any
locale choice, and the program generally runs in the default "C" locale.
Now we get:

scheme@(guile-user)> (set-port-encoding! (current-input-port) "UTF-8")
scheme@(guile-user)> (day-of-week-string)
$5 = "Monday"
scheme@(guile-user)> (port-encoding (current-input-port))
$6 = "UTF-8"
scheme@(guile-user)> (day-of-week-string-for-locale "de_DE")
$7 = "Montag"
scheme@(guile-user)> (port-encoding (current-input-port))
$8 = "ANSI_X3.4-1968"

The locale-restoring part of call-with-locale, called via
day-of-week-string-for-locale, now has the side effect of setting the
input's encoding to the nominal encoding of the "C" locale, namely ASCII.
If not worked around, input processing breaks.

Is that sketch close enough to a concrete example?

>I believe that the intention (for better or for worse) is that calling
>`setlocale' with 2 arguments changes the "default port encoding".

(Aside:) *any* two-argument call, even if not relevant to encoding?
The encoding thing is only derived from LC_CTYPE, so even if one is
expecting something like this it's a bit surprising for an LC_TIME call
to affect encoding.

>the next port you open will have the encoding specified by the
>`setlocale', if you don't change it explicitly later.

To achieve the effect you've stated there, there is potentially a better
way.  You have quite sensibly described the effect at a higher user-story
kind of level, rather than say exactly what happens to the fluid.
You've put the fluid there, and documented it, as a perfectly sensible
way for the user to control the default port encoding.  As things stand,
the setlocale side effect is interfering with that control.

Suppose that instead the default port encoding fluid can take a special
value #:locale-at-open, which has the effect that when a port is opened
it will get its encoding set from the current locale.  You then have
the fluid default to that value, and have setlocale not touch the fluid
at all.  This way, if the user doesn't touch the fluid but does call
setlocale then the locale controls the encoding of new ports.  But if
the user does set the fluid (to something other than #:locale-at-open),
indicating a desire to specifically control default port encoding, then
setlocale doesn't clobber the user's choice.  How does this sound to you?

>                                                       But I don't think
>it should change the encoding of already-open ports, should it?

In a situation where setlocale is expected to deliberately side-effect
the default port encoding fluid, I can't figure out whether to expect it
to do more.  I suppose on general principle it's less surprising for it
to do less.  It's certainly less work to work around it, where the side
effects are unwanted.

If you go with the #:locale-at-open plan that I described above, then
setlocale should definitely not touch the encoding of already-open ports.
Just so that it is localisable as originally designed.

There's another way to get the best of both worlds.  In addition to the
#:locale-at-open value for the default port encoding fluid, there could
also be some special encoding value for a port, #:locale-at-io, meaning
to use whatever locale is in effect at the time of an I/O operation.
#:locale-at-io is also a valid value for the fluid, which will be copied
into a new port in the regular way.  The stdin, stdout, and stderr ports
that are automatically opened at program initialisation can be set to
#:locale-at-io, and setlocale now doesn't directly set the encoding of
any port.  If the user calls setlocale without otherwise controlling port
encoding then the locale controls the encoding of the primordial ports.
I expect that's the effect that the setlocale code was aiming for,
given that when setlocale is called it's too late to affect the opening
of the primordial ports.

-zefram





reply via email to

[Prev in Thread] Current Thread [Next in Thread]