bug-glibc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ru_RU locale bug?


From: E.Rodichev
Subject: Re: ru_RU locale bug?
Date: Wed, 1 May 2002 02:23:34 +0300 (GMT)

On Tue, 30 Apr 2002, Keld [iso-8859-1] JЬrn Simonsen wrote:

> On Tue, Apr 30, 2002 at 02:35:12PM +0300, E.Rodichev wrote:
> > Dear colleagues,
> >
> > I found a strange behaviour of ru_RU locale (revision "1.0",
> > date "2000-06-29").
> >
> > The problem occurs when comparing ASCII strings with punctuation
> > symbols, like ".", ",", etc.
> >
> > For example, with C locale
> > strcoll(".b", "b") < 0
> >
> > but with ru_RU locale
> > strcoll(".b", "b") > 0
> >
> > As a result, sorting of files with only english (us-ascii) files leads to
> > different results. It affects many program, such as /bin/ls, etc.
> >
> > Is it a bug, or intended behaviour? Typically, it seems more convenient
> > when setting ru_RU locale has influence only to processing of strings with
> > really cyrillic symbols, but not to the us-ascii strings.
>
> Generally it is intended behaviour that other locales sort different
> for ASCII than the C locale. For example is is normal that small and
> capital letters sort together. In your specific example I believe
> that the difference is intended too.

Not the case for cyrillic locales at all. The sorting rules in Russian,
Ukranian and many other languages of cyrillic family are the same as in
English (and most of European languages).

>From another hand, this new locale leads to *tremendous* number of
incompatibilities with older software. Only one example:

# setenv|grep LC
LC_CTYPE=ru_RU.KOI8-R
# /bin/ls -la
total 16
drwxr-xr-x    3 er       devel        4096 May  1 02:59 .
drwx--x--x    9 er       devel        4096 May  1 02:58 ..
drwxr-xr-x    2 er       devel        4096 Apr 29 21:04 bin
-rw-r--r--    1 er       devel        4062 May  1 02:58 .cshrc

# setenv LC_COLLATE C
# /bin/ls -la
total 16
drwxr-xr-x    3 er       devel        4096 May  1 02:59 .
drwx--x--x    9 er       devel        4096 May  1 02:58 ..
-rw-r--r--    1 er       devel        4062 May  1 02:58 .cshrc
drwxr-xr-x    2 er       devel        4096 Apr 29 21:04 bin

So, a lot of written software which expect the latter behaviour from
/bin/ls is broken with this new locale.

It is very important problem for portability of Linux distributions
in Russia and many other countries with cyrillic-based languages. I am
not sure about ISO standards, but this way clearly leads to many troubles -
either for compatibility with old software, as well as with new one.

I suppose that it will be much better to keep the traditional behaviour
of ru_RU locale, which does not affect the sorting of 7-bit ascii codes.

For example, the ru_RU locale in new FreeBSD distributions does not lead
to any troubles and/or incompatibility (I just checked under FreeBSD
4.5-STABLE). Is it a way for Linux to another direction?


Best wishes,
E.R.
_________________________________________________________________________
Evgeny Rodichev                          Sternberg Astronomical Institute
System/Net Admin                                  Moscow State University
email: address@hidden
Phone: 007 (095) 939 2383
Fax:   007 (095) 932 8841                       http://www.sai.msu.su/~er





reply via email to

[Prev in Thread] Current Thread [Next in Thread]