[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
From: |
Dimitry Andric |
Subject: |
bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate |
Date: |
Sun, 2 Apr 2023 11:33:42 +0200 |
Ah sorry, I did indeed rebuild grep with --with-included-regex, and for
debugging purposes added CFLAGS="-O0 -g".
In any case, the problematic code is both in glibc and grep, as I
believe these are originating from the same source.
-Dimitry
> On 2 Apr 2023, at 08:52, arnold@skeeve.com wrote:
>
> Hi.
>
> Dimitry Andric <dimitry@andric.com> wrote:
>
>> Yes, it still reproduces when I configure the latest grep using
>> --without-included-regex:
>
> I assume you meant --with-included-regex?
>
> If you really used --without-included-regex, that doesn't prove anything...
> :-)
>
> It's interesting, as gawk uses the same regex, but with different flags.
> It might be worth trying to understand which of the syntax bits
> is causing this.
>
> Thanks,
>
> Arnold
>
>>
>> 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
>> 1: idx = 0
>> (gdb)
>> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx,
>> nmatch);
>> 1: idx = 0
>> (gdb)
>> 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
>> 1: idx = 0
>> (gdb)
>> 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
>> 1: idx = 0
>> (gdb)
>> 1428 cur_node = proceed_next_node (mctx, nmatch, pmatch,
>> prev_idx_match,
>> 1: idx = 0
>> (gdb)
>> 1432 if (__glibc_unlikely (cur_node < 0))
>> 1: idx = 0
>> (gdb)
>> 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
>> 1: idx = 0
>> (gdb)
>> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx,
>> nmatch);
>> 1: idx = 0
>> (gdb)
>> 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
>> 1: idx = 0
>> (gdb)
>> 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
>> 1: idx = 0
>>
>> The endless loop looks the same. grep's regexec.c is mostly the same as
>> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
>> directives added.
>>
>> -Dimitry
>>
>>> On 28 Mar 2023, at 08:46, arnold@skeeve.com wrote:
>>>
>>> This does not reproduce with gawk, even when I force use of the regex
>>> matcher.
>>>
>>> What happens if grep is built with the included regex files instead of
>>> relying on the ones in the local glibc?
>>>
>>> Arnold
>>>
>>> Dimitry Andric <dimitry@andric.com> wrote:
>>>
>>>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@chalmers.se> wrote:
>>>>>
>>>>> Running the command:
>>>>>
>>>>> echo a | grep -E -w '((()|a)|())*'
>>>>>
>>>>> does not terminate, and uses a LOT of processor time, for all versions of
>>>>> grep I have tried.
>>>>>
>>>>> This is the smallest case that could be found; simplifying anything in the
>>>>> input and/or expression leads to correct behavior.
>>>>
>>>> Reproducible with GNU grep 3.7 on Ubuntu 22:
>>>>
>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
>>>> COMMAND
>>>> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 grep
>>>> -E -w ((()|a)|())*
>>>>
>>>> It seems that at least on Ubuntu, grep in this mode uses glibc's
>>>> regexec(3), and it is that implementation which ends up in an endless loop.
>>>>
>>>> It loops between lines 1415, 1417 and 1443, but idx and cur_node never
>>>> change from 0:
>>>>
>>>> 1378 static reg_errcode_t
>>>> 1379 __attribute_warn_unused_result__
>>>> 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx,
>>>> size_t nmatch,
>>>> 1381 regmatch_t *pmatch, bool fl_backtrack)
>>>> 1382 {
>>>> ...
>>>> 1415 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
>>>> 1416 {
>>>> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx,
>>>> nmatch);
>>>> 1418
>>>> 1419 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
>>>> 1420 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
>>>> ...
>>>> 1442 /* Proceed to next node. */
>>>> 1443 cur_node = proceed_next_node (mctx, nmatch, pmatch,
>>>> prev_idx_match,
>>>> 1444 &idx, cur_node,
>>>> 1445 &eps_via_nodes, fs);
>>>> 1446
>>>> 1447 if (__glibc_unlikely (cur_node < 0))
>>>> ...
>>>> 1465 }
>>>> 1466 }
>>>>
>>>> -Dimitry
>>>>
>>>> P.S.: Interestingly this does not reproduce with BSD grep, which returns
>>>> immediately with "a".
>>>>
>>
signature.asc
Description: Message signed with OpenPGP