bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: libc6: regex (re_exec) segfault in UTF-8 locale [Re: grep 2.5.1 segf


From: Eric Agnew
Subject: Re: libc6: regex (re_exec) segfault in UTF-8 locale [Re: grep 2.5.1 segfault, and (more) color patch (again)
Date: Tue, 8 Apr 2003 18:50:13 -0700
User-agent: Mutt/1.4i-ja.1

>> First, a bug report: I'm getting a segfault on grep 2.5.1 when
>> grepping the edict file (
>> http://ftp.cc.monash.edu.au/pub/nihongo/edict.gz ):
>>         egrep '^(.)(.)(.)\1\2\3 ' edict
>>   or:
>>         grep '^\(.\)\(.\)\(.\)\1\2\3 ' edict
>> both output 13 lines and the seg fault.  [...]For reference, I'm
>> running Linux (debian/unstable) on x86.
> Thanks for the report.  Note that to reproduce the failure you
> probably have to be using a UTF-8 locale.  The system I used happened
> to have fr_FR.UTF-8 installed, so I used that, even though the data in
> that file is in Japanese.

Hm.  Should've noted the locale settings, too, I suppose.  I ran it in a
term set to ja_JP.eucJP, since the file is encoded in EUC, and a number
of Japanese applications (namely xjdic) don't work in UTF-8...

Note, however that if you do:

        iconv -f euc-jp -t utf-8 edict|egrep '^(.)(.)(.)\1\2\3 '

it seems to work just fine.  Don't know if that affects any of your
findings.

> On a system with x86 Linux debian/unstable (grep-2.5.1-4
> and libc6-2.3.1-16), I pared it down to this:
>   $ printf pMik3KTIpNwK | recode /64 \
>     | LC_ALL=fr_FR.UTF-8 /bin/grep -nE '^(.)(.)(.)\1\2\3 '
>   Segmentation fault
>   [Exit 139 (SIGSEGV)]
> This also does it:
>   $ grep totteringly edict|LC_ALL=fr_FR.UTF-8 /bin/grep -nE '^(.)(.)(.)\1\2\3 
> '
>   Segmentation fault
>   [Exit 139 (SIGSEGV)]
> It looks like a problem in libc's re_exec function:
>   $ LC_ALL=fr_FR.UTF-8 gdb /bin/grep
>   (gdb) r -E '^(.)(.)(.)\1\2\3 ' k
>   Starting program: /bin/grep -E '^(.)(.)(.)\1\2\3 ' k
>   (no debugging symbols found)...(no debugging symbols found)...
>   Program received signal SIGSEGV, Segmentation fault.
>   0x400c9ad5 in re_exec () from /lib/libc.so.6
>   (gdb)
> But note that if you rebuild grep by running `configure
> --with-included-regex' the resulting binary doesn't segfault.  It
> doesn't find any matches, either.  The same thing happens if I link
> grep with the very latest regex code from glibc's CVS repository.

-- 
Eric Agnew                                       agnew at geekhive dot net




reply via email to

[Prev in Thread] Current Thread [Next in Thread]