[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
From: |
arnold |
Subject: |
bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate |
Date: |
Tue, 28 Mar 2023 00:46:53 -0600 |
User-agent: |
Heirloom mailx 12.5 7/5/10 |
This does not reproduce with gawk, even when I force use of the regex
matcher.
What happens if grep is built with the included regex files instead of
relying on the ones in the local glibc?
Arnold
Dimitry Andric <dimitry@andric.com> wrote:
> On 27 Mar 2023, at 11:14, Koen Claessen <koen@chalmers.se> wrote:
> >
> > Running the command:
> >
> > echo a | grep -E -w '((()|a)|())*'
> >
> > does not terminate, and uses a LOT of processor time, for all versions of
> > grep I have tried.
> >
> > This is the smallest case that could be found; simplifying anything in the
> > input and/or expression leads to correct behavior.
>
> Reproducible with GNU grep 3.7 on Ubuntu 22:
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 grep
> -E -w ((()|a)|())*
>
> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3),
> and it is that implementation which ends up in an endless loop.
>
> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change
> from 0:
>
> 1378 static reg_errcode_t
> 1379 __attribute_warn_unused_result__
> 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t
> nmatch,
> 1381 regmatch_t *pmatch, bool fl_backtrack)
> 1382 {
> ...
> 1415 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> 1416 {
> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx,
> nmatch);
> 1418
> 1419 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> 1420 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> ...
> 1442 /* Proceed to next node. */
> 1443 cur_node = proceed_next_node (mctx, nmatch, pmatch,
> prev_idx_match,
> 1444 &idx, cur_node,
> 1445 &eps_via_nodes, fs);
> 1446
> 1447 if (__glibc_unlikely (cur_node < 0))
> ...
> 1465 }
> 1466 }
>
> -Dimitry
>
> P.S.: Interestingly this does not reproduce with BSD grep, which returns
> immediately with "a".
>