[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: libc6: regex (re_exec) segfault in UTF-8 locale [Re: grep 2.5.1 segf
From: |
Eric Agnew |
Subject: |
Re: libc6: regex (re_exec) segfault in UTF-8 locale [Re: grep 2.5.1 segfault, and (more) color patch (again) |
Date: |
Tue, 8 Apr 2003 18:50:13 -0700 |
User-agent: |
Mutt/1.4i-ja.1 |
>> First, a bug report: I'm getting a segfault on grep 2.5.1 when
>> grepping the edict file (
>> http://ftp.cc.monash.edu.au/pub/nihongo/edict.gz ):
>> egrep '^(.)(.)(.)\1\2\3 ' edict
>> or:
>> grep '^\(.\)\(.\)\(.\)\1\2\3 ' edict
>> both output 13 lines and the seg fault. [...]For reference, I'm
>> running Linux (debian/unstable) on x86.
> Thanks for the report. Note that to reproduce the failure you
> probably have to be using a UTF-8 locale. The system I used happened
> to have fr_FR.UTF-8 installed, so I used that, even though the data in
> that file is in Japanese.
Hm. Should've noted the locale settings, too, I suppose. I ran it in a
term set to ja_JP.eucJP, since the file is encoded in EUC, and a number
of Japanese applications (namely xjdic) don't work in UTF-8...
Note, however that if you do:
iconv -f euc-jp -t utf-8 edict|egrep '^(.)(.)(.)\1\2\3 '
it seems to work just fine. Don't know if that affects any of your
findings.
> On a system with x86 Linux debian/unstable (grep-2.5.1-4
> and libc6-2.3.1-16), I pared it down to this:
> $ printf pMik3KTIpNwK | recode /64 \
> | LC_ALL=fr_FR.UTF-8 /bin/grep -nE '^(.)(.)(.)\1\2\3 '
> Segmentation fault
> [Exit 139 (SIGSEGV)]
> This also does it:
> $ grep totteringly edict|LC_ALL=fr_FR.UTF-8 /bin/grep -nE '^(.)(.)(.)\1\2\3
> '
> Segmentation fault
> [Exit 139 (SIGSEGV)]
> It looks like a problem in libc's re_exec function:
> $ LC_ALL=fr_FR.UTF-8 gdb /bin/grep
> (gdb) r -E '^(.)(.)(.)\1\2\3 ' k
> Starting program: /bin/grep -E '^(.)(.)(.)\1\2\3 ' k
> (no debugging symbols found)...(no debugging symbols found)...
> Program received signal SIGSEGV, Segmentation fault.
> 0x400c9ad5 in re_exec () from /lib/libc.so.6
> (gdb)
> But note that if you rebuild grep by running `configure
> --with-included-regex' the resulting binary doesn't segfault. It
> doesn't find any matches, either. The same thing happens if I link
> grep with the very latest regex code from glibc's CVS repository.
--
Eric Agnew agnew at geekhive dot net