[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Uppercase string: broken tr?
From: |
Alex J. Dam |
Subject: |
Re: Uppercase string: broken tr? |
Date: |
Sun, 24 Aug 2003 18:42:14 -0300 |
User-agent: |
Opera7.11/Linux M2 build 406 |
On Sun, 24 Aug 2003 14:16:28 -0600, Bob Proulx <address@hidden> wrote:
Bruno Haible wrote:
Alex J. Dam wrote:
> $ echo 'ABÇ' | tr [:upper:] [:lower:]
> abÇ
> (the last character is an uppercase cedilla)
> I expecte its output to be:
> abç
What does 'locale' say in this case?
$ locale
LANG=pt_BR.UTF-8
LC_CTYPE="pt_BR.UTF-8"
LC_NUMERIC="pt_BR.UTF-8"
LC_TIME="pt_BR.UTF-8"
LC_COLLATE="pt_BR.UTF-8"
LC_MONETARY="pt_BR.UTF-8"
LC_MESSAGES="pt_BR.UTF-8"
LC_PAPER="pt_BR.UTF-8"
LC_NAME="pt_BR.UTF-8"
LC_ADDRESS="pt_BR.UTF-8"
LC_TELEPHONE="pt_BR.UTF-8"
LC_MEASUREMENT="pt_BR.UTF-8"
LC_IDENTIFICATION="pt_BR.UTF-8"
LC_ALL=pt_BR.UTF-8
$ echo 'ABÇ' | tr [:upper:] [:lower:]
abÇ
But sed and tr and other utilities just use the locale data provided
on the system by glibc among other places. These programs are table
driven by tables that are not part of these programs. This is why
locale problems are global problems across the entire system of
programs such as grep, sed, awk, tr, etc. or anything else that uses
the locale data.
I tried it with different locales, all of them show the same results.
Looking at sed 4.0.7 source code, execeute.c:
/* Now do the required modifications. First \[lu]... */
if (type & repl_uppercase_first)
{
*start = toupper(*start);
start++;
type &= ~repl_uppercase_first;
}
I'm not a Linux C programmer.
start was declared as "char". sed uses toupper, not towupper. Does
this have something to do with its behaviour?
I typed a simple program:
#include <string.h>
#include <locale.h>
#include <stdio.h>
int main(){
setlocale(LC_ALL, "pt_BR.UTF-8");
int x;
for(x = 0; x <= 255; x++){
int y = towupper(x);
if(x != y)
printf("%u -> %u *\n", x, y);
else
printf("%u -> %u\n", x, y);
}
}
In its output, the line
199 -> 231 *
appears.
Ok, as I said above, I am NOT a Linux programmer and this could be
nonsense.
Alex