[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Pattern replacement fails if string contains multibyte characters
From: |
Chet Ramey |
Subject: |
Re: Pattern replacement fails if string contains multibyte characters |
Date: |
Fri, 28 Sep 2007 17:02:42 -0400 |
User-agent: |
Thunderbird 2.0.0.6 (Macintosh/20070728) |
Bernd Eggink wrote:
> This happens on a utf-8 based system (CRUX 2.3), LANG=de_DE.UTF-8:
>
> t="123abc456äöüABCD"
> echo ${t//[a-c]/}
> # output: 123456öüCD
> # (should be: "123456äöüABCD")
>
> echo ${t//[!a-c]/}
> # output: abcäAB
> # (should be: "abc")
>
> bash --version:
> GNU bash, version 3.2.25(1)-release (i686-pc-linux-gnu)
>
> Without multibyte chars, replacement works as expected. I looks like a
> bug, or am I misssing something?
I get the expected output using Mac OS X or FreeBSD; the same output you
do using FC6.
The difference is in the gnu libc implementation of strcoll(), which bash
uses to compare characters for range matching. The glibc implementation
ignores the locale; the other systems incorporate the current locale's
collating sequence into their strcoll implementation.
Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
Live Strong. No day but today.
Chet Ramey, ITS, CWRU chet@case.edu http://cnswww.cns.cwru.edu/~chet/