bug-guile
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 regression in guile 1.9.5


From: Mike Gran
Subject: Re: UTF-8 regression in guile 1.9.5
Date: Fri, 11 Dec 2009 07:05:55 -0800 (PST)

> From: Andy Wingo <address@hidden>
> Hi,
>
> On Sun 06 Dec 2009 21:43, Linas Vepstas writes:
>
> > 2009/12/6 Mike Gran :
> >>
> >>> > need to call (setlocale LC_ALL "")
> >>
> >> But for Guile to store characters as codepoints, declaring a locale
> >> pretty much a requirement now.
> >
> > Would it make sense to add (setlocale LC_ALL "") to some default,
> > e.g. boot-9.scm  ?
>
> Mike I admit I don't follow this completely. Does Linas' suggestion
> make sense? I somehow thought that locales would magically just
> work.

If we always call setlocale, legacy code that used UTF-8 and other
non-Latin locales will just work.  Legacy code that used strings to
contain binary data would break.

(Of couse, UTF-8 strings only worked on Guile 1.8.x so long
as you either never looked at substrings or chars, or did
UTF-8 parsing yourself.)

As it is now, the opposite is true: legacy code with strings
containing binary data will just work; strings containing non-8-bit
locale encoded strings will break.

| 1.8.x             | setlocale |
| Strings           | called    | Guile 2.0
| contain           | 1.8 | 2.0 | will
-----------------------------------------------------------------
| ASCII             | Y/N | Y/N | just work
----------------------------------------------------------------- 
| locale-encoded    | Y/N | Y   | just work
| strings           |     |     |
-----------------------------------------------------------------
| locale-encoded    | Y/N | N   | interpret string bytes as
| strings           |     |     | Latin-1
-----------------------------------------------------------------
| binary data       | Y/N | Y   | if locale is Latin-1: just work
|                   |     |     |
|                   |     |     | if locale is not latin-1:
|                   |     |     | interpret string bytes using
|                   |     |     | locale encoding
-----------------------------------------------------------------
| binary data       | Y/N | N   | just work
|                   |     |     |

I think I prefer that the coder take the responsibility of calling
setlocale, but, I only think that because it is how C works.  I'm used
to that convention.

Thanks,

Mike





reply via email to

[Prev in Thread] Current Thread [Next in Thread]