[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
sed cannot process non-ASCII characters correctly
From: |
Bruno Haible |
Subject: |
sed cannot process non-ASCII characters correctly |
Date: |
Tue, 8 May 2001 15:42:36 +0200 (CEST) |
Hi,
sed 3.02 has severe problems with multibyte character encodings.
According to SUSV2, the LANG/LC_CTYPE/LC_ALL environment variables should
influence the character notion of sed. But it doesn't in sed-3.02.
A test script is appended below, to be executed in an UTF-8 locale (e.g.
glibc-2.2.2 ko_KR.UTF-8 locale). The regexp engine in glibc-2.2.2 has now
all i18n support. The remaining problem in sed is:
sed doesn't call setlocale, and thus ignores the user's
LANG/LC_CTYPE/LC_ALL environment variables.
Here is a fix for it.
2001-05-05 Bruno Haible <address@hidden>
* configure.in: Test for setlocale.
* sed/sed.c: Include locale.h.
(main): Call setlocale.
*** sed-3.02/configure.in.bak Sun Aug 2 02:38:33 1998
--- sed-3.02/configure.in Sun May 6 01:28:48 2001
***************
*** 99,105 ****
AC_FUNC_VPRINTF
AC_REPLACE_FUNCS(memchr memcmp memmove strerror)
! AC_CHECK_FUNCS(isatty bcopy bzero isascii memcpy)
AC_ARG_PROGRAM
AC_OUTPUT(Makefile djgpp/Makefile doc/Makefile dnl
--- 99,105 ----
AC_FUNC_VPRINTF
AC_REPLACE_FUNCS(memchr memcmp memmove strerror)
! AC_CHECK_FUNCS(isatty bcopy bzero isascii memcpy setlocale)
AC_ARG_PROGRAM
AC_OUTPUT(Makefile djgpp/Makefile doc/Makefile dnl
*** sed-3.02/sed/sed.c.bak Fri Jul 3 03:06:26 1998
--- sed-3.02/sed/sed.c Sun May 6 01:27:48 2001
***************
*** 33,38 ****
--- 33,40 ----
# include <stdlib.h>
#endif
+ #include <locale.h>
+
#ifdef HAVE_MMAP
# ifdef HAVE_UNISTD_H
# include <unistd.h>
***************
*** 129,134 ****
--- 131,141 ----
flagT bad_input; /* If this variable is non-zero at exit, one or
more of the input files couldn't be opened. */
+ #ifdef HAVE_SETLOCALE
+ /* Set locale via LC_ALL. */
+ setlocale (LC_ALL, "");
+ #endif
+
POSIXLY_CORRECT = (getenv("POSIXLY_CORRECT") != NULL);
#ifdef STUB_FROM_RX_LIBRARY_USAGE
if (!rx_default_cache)
2) The autoconfiguration fails to recognize the regex in glibc and uses its
own. The user has to configure "--with-regex=" so that lib/regex.o is not
built. This should be fixed to use glibc's regex by default if the system
is glibc 2.2.2 or newer.
Bruno
begin 644 sed-sample-run-good
M)"!E8VAO(,address@hidden"address@hidden@)W,O7"@N7"E<,2]<,2\G"L.D"address@hidden
C;R##I,address@hidden"address@hidden@)W,O6\.D72\O9R<*P[;#O`H`
`
end
begin 644 sed-sample-run-bad
M)"!E8VAO(,address@hidden"address@hidden@)W,O7"@N7"E<,2]<,2\G"L.DPZ0*)"!E
F8VAO(,.DP[;#O"!\('-E9"`M92`G<R];PZ1=+R]G)PK#O>^_O0H`
`
end
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- sed cannot process non-ASCII characters correctly,
Bruno Haible <=