[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 1/2] dfa: convert to wide character line-by-line
From: |
Jim Meyering |
Subject: |
Re: [PATCH 1/2] dfa: convert to wide character line-by-line |
Date: |
Wed, 05 May 2010 09:59:59 +0200 |
Jim Meyering wrote:
> Paolo Bonzini wrote:
>> This provides a nice speedup for -m in general, but especially
>> it avoids quadratic complexity in case we have to go to glibc.
>>
>> * NEWS: Document change.
>> * src/dfa.c (prepare_wc_buf): Extract out of dfaexec. Convert
>> only up to the next newline.
>> (dfaexec): Exit multibyte processing loop if past buf_end.
>> Call prepare_wc_buf again after processing a newline.
>
> Nice indeed, but it induces an abort from glibc with some inputs:
> This is on F13:
>
> $ export LC_ALL=fr_FR.UTF-8
> $ printf '\xc3\n' > in; src/grep '[é]' in
> *** buffer overflow detected ***: src/grep terminated
> ======= Backtrace: =======
> /lib64/libc.so.6(__fortify_fail+0x37)[0x33920fedb7]
> /lib64/libc.so.6[0x33920fcd30]
> /lib64/libc.so.6(__strncpy_chk+0x17b)[0x33920fbfeb]
> ...
> zsh: abort (core dumped) src/grep '[é]' in
>
>> +/* Initialize mblen_buf and inputwcs with data from the next line. */
>> +
>> +static void
>> +prepare_wc_buf (const char *begin, const char *end)
>> +{
>> + unsigned char eol = eolbyte;
>> + size_t remain_bytes, i;
>
> Here's the quick summary:
>
> "remain_bytes" must not be of type "size_t".
> Please leave it as "int", like it was in the code you moved.
Perhaps better:
Leave remain_bytes declared as size_t, since that's what mbrtowc
returns, after all.
But then you'll have to adjust the test to explicitly
check for (size_t) -1 and (size_t) -2.
if (remain_bytes < 1
+ || remain_bytes == (size_t) -1
+ || remain_bytes == (size_t) -2
|| (remain_bytes == 1 && inputwcs[i] == (wchar_t)begin[i]))
Re: [PATCH 1/2] dfa: convert to wide character line-by-line, Jim Meyering, 2010/05/05
- Re: [PATCH 1/2] dfa: convert to wide character line-by-line,
Jim Meyering <=