bug-glibc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ru_RU locale bug?


From: Keld Jørn Simonsen
Subject: Re: ru_RU locale bug?
Date: Wed, 1 May 2002 05:56:29 +0200
User-agent: Mutt/1.3.27i

On Wed, May 01, 2002 at 02:23:34AM +0300, E.Rodichev wrote:
> On Tue, 30 Apr 2002, Keld [iso-8859-1] J?rn Simonsen wrote:
> 
> > On Tue, Apr 30, 2002 at 02:35:12PM +0300, E.Rodichev wrote:
> > > Dear colleagues,
> > >
> > > I found a strange behaviour of ru_RU locale (revision "1.0",
> > > date "2000-06-29").
> > >
> > > The problem occurs when comparing ASCII strings with punctuation
> > > symbols, like ".", ",", etc.
> > >
> > > For example, with C locale
> > > strcoll(".b", "b") < 0
> > >
> > > but with ru_RU locale
> > > strcoll(".b", "b") > 0
> > >
> > > As a result, sorting of files with only english (us-ascii) files leads to
> > > different results. It affects many program, such as /bin/ls, etc.
> > >
> > > Is it a bug, or intended behaviour? Typically, it seems more convenient
> > > when setting ru_RU locale has influence only to processing of strings with
> > > really cyrillic symbols, but not to the us-ascii strings.
> >
> > Generally it is intended behaviour that other locales sort different
> > for ASCII than the C locale. For example is is normal that small and
> > capital letters sort together. In your specific example I believe
> > that the difference is intended too.
> 
> Not the case for cyrillic locales at all. The sorting rules in Russian,
> Ukranian and many other languages of cyrillic family are the same as in
> English (and most of European languages).

Understood, and in English and most other languages capital "A" and
lowercase "a" sort together. 

> >From another hand, this new locale leads to *tremendous* number of
> incompatibilities with older software. Only one example:
> 
> # setenv|grep LC
> LC_CTYPE=ru_RU.KOI8-R
> # /bin/ls -la
> total 16
> drwxr-xr-x    3 er       devel        4096 May  1 02:59 .
> drwx--x--x    9 er       devel        4096 May  1 02:58 ..
> drwxr-xr-x    2 er       devel        4096 Apr 29 21:04 bin
> -rw-r--r--    1 er       devel        4062 May  1 02:58 .cshrc

yes, this is also what is done in my Danish locale and it is intended.

> # setenv LC_COLLATE C
> # /bin/ls -la
> total 16
> drwxr-xr-x    3 er       devel        4096 May  1 02:59 .
> drwx--x--x    9 er       devel        4096 May  1 02:58 ..
> -rw-r--r--    1 er       devel        4062 May  1 02:58 .cshrc
> drwxr-xr-x    2 er       devel        4096 Apr 29 21:04 bin
> 
> So, a lot of written software which expect the latter behaviour from
> /bin/ls is broken with this new locale.
> 
> It is very important problem for portability of Linux distributions
> in Russia and many other countries with cyrillic-based languages. I am
> not sure about ISO standards, but this way clearly leads to many troubles -
> either for compatibility with old software, as well as with new one.
> 
> I suppose that it will be much better to keep the traditional behaviour
> of ru_RU locale, which does not affect the sorting of 7-bit ascii codes.
> 
> For example, the ru_RU locale in new FreeBSD distributions does not lead
> to any troubles and/or incompatibility (I just checked under FreeBSD
> 4.5-STABLE). Is it a way for Linux to another direction?

I don't know if this is a better direction. I am quite involved in the
danish Linux environment and the change you indicate has been around for
a while in Danish Linux. I have heard no complaints whatsoever from
Danish users on this change. I believe there has been some complaints
from other communities, however. From a general view, I do think that
the current sorting of Aa together and ignoring special characters in
the first round of comparisons is the way to go.

Best regards
Keld



reply via email to

[Prev in Thread] Current Thread [Next in Thread]