[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
grep does not process non-ASCII characters correctly
From: |
Bruno Haible |
Subject: |
grep does not process non-ASCII characters correctly |
Date: |
Tue, 8 May 2001 15:43:08 +0200 (CEST) |
Hi,
grep-2.5a has severe problems with multibyte character encodings.
According to SUSV2, the LANG/LC_CTYPE/LC_ALL environment variables should
influence the character notion of grep. But it doesn't in grep-2.5a.
A test script is appended below, to be executed in an UTF-8 locale (e.g.
glibc-2.2.2 ko_KR.UTF-8 locale). The regexp engine in glibc-2.2.2 has now
all i18n support. The remaining problems in grep appear to be located in
dfa.h and dfa.c.
Bruno
begin 644 grep-sample-run-good
M)"!E8VAO("?#I,.VP[PG('address@hidden)E<"`G6ULZ86QP:&$Z75TG"L.DP[;#O`HD
I(&5C:&\@)V'#ML.VP[PG('address@hidden)E<"`GP[9<>S)<?2<*8<.VP[;#O`H`
`
end
begin 644 grep-sample-run-bad
M)"!E8VAO("?#I,.VP[PG('address@hidden)E<"`G6ULZ86QP:&$Z75TG"address@hidden;R`G
:8<.VP[;#O"<@?"!G<F5P("?#MEQ[,EQ])PH`
`
end
- grep does not process non-ASCII characters correctly,
Bruno Haible <=