bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parameter expansion with extended pattern make system hang


From: Greg Wooledge
Subject: Re: Parameter expansion with extended pattern make system hang
Date: Tue, 6 Sep 2022 13:06:49 -0400

On Tue, Sep 06, 2022 at 11:55:52AM -0400, Chet Ramey wrote:
> On 9/4/22 10:48 PM, Hyunho Cho wrote:
> 
> > Bash Version: 5.1
> > Patch Level: 16
> > Release Status: release
> > 
> > ##############################################################
> > 
> > 
> > #### "gcc --help" is already a short string, but the system hangs.
> 
> A short string? It's 90K on my system.

unicorn:~$ gcc --help | wc
     61     430    3995

Quite curious.  I wonder what makes it so large on yours.  (Although I
would not call 4k characters a "short string" either.)

That said, the reported symptoms (bash using tremendous amounts of CPU)
also occured with the 4k input string on my system.

> > ( without using extended pattern, there is no such problem )
> > 
> > bash$ help=$( gcc --help )
> > 
> > bash$ echo "${help//+([$' \t\n'])/ }"
> 
> So what you're doing is taking a 90K string, and for each character in the
> string, trying to match it against successively shorter substrings,
> starting at the end to preserve the required `leftmost longest' match
> semantics. It's worse because you can't calculate the pattern length here,
> so you can't bound the search at all. Nor are you building a regular
> expression and trying to execute it against the strings; this is just a
> simple pattern matcher.
> 
> It's hard to think of a less efficient way of doing whatever it is you're
> trying to do.

Conceptually, +([$' \t\n']) is just a regular expression.  I guess it's
surprising that a regular expression written this way (as an extglob)
performs so differently from a regular expression written in ERE or BRE
syntax.

It seems pretty likely that the goal here is to divide an input string
into words, where "one or more whitespace characters" can separate the
words.  I'm assuming that the use of "gcc --help" as input is just a
non-representative example.  (Why do we always get such bad examples?)

Running with this assumption, now that we know bash's extglob matching
will not perform suitably for this task, we can look at other approaches.
Here's a pretty typical one, using GNU sed features (therefore not
portable):

unicorn:~$ printf 'a b c\nd\t\t\te\nf\n' | sed 's/[ \t]\{1,\}/\n/g'
a
b
c
d
e
f

Once you've reformatted it to "one word per line" in this way, you can
load the words into a bash indexed array, or whatever else you need.

If this isn't what you want, but if you can express the underlying
original goal more clearly, I'm sure someone can offer an answer.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]