bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#44754: Extreme performance degradation in GNU grep 3.4 / 3.6


From: Frank Heckenbach
Subject: bug#44754: Extreme performance degradation in GNU grep 3.4 / 3.6
Date: Sun, 06 Dec 2020 06:34:11 +0100

Jim Meyering <jim@meyering.net> wrote:

> On Thu, Dec 3, 2020 at 12:26 AM Norihiro Tanaka <noritnk@kcn.ne.jp> wrote:
> > On Thu, 26 Nov 2020 21:41:20 -1000
> > Jim Meyering <jim@meyering.net> wrote:
> >
> > > On Thu, Nov 26, 2020 at 9:03 AM Jim Meyering <jim@meyering.net> wrote:
> > > >
> > > > On Wed, Nov 25, 2020 at 3:12 PM Jim Meyering <jim@meyering.net> wrote:
> > > > > Thank you for the fine bug report.
> > > > > The grep-3.6 bug you've exposed is due to the fact that your input
> > > > > triggers excessive hash collisions when using the code modeled after
> > > > > gnulib/lib/hash-pjw.c. That made the new pattern-preprocessing phase
> > > > > take O(N^2) time for N patterns. In the attached, I've switched grep
> > > > > to use the djb2 hash function, and that resolves the problem. I'll
> > > > > also add a NEWS entry and a test before pushing this.
> > > >
> > > > Timings suggest that grep-3.6's preprocessing came closer to O(N^3).
> > > > Here's an example that would take 2-3 days with grep-3.6 and only
> > > > seconds with this fix:
> > > >
> > > >   : | grep -Ff <(seq 6400000 | tr 0-9 A-J)
> > > >
> > > > Here's a complete patch.
> > > > I'll push it later today.
> > >
> > > Pushed along with two gnulib-related changes.
> >
> > The fix has improved some performance.  However, it's still quite slow
> > compared to version 3.3, and that can be remedied.
> >
> > It converts to grep only if the potential match does not match the word
> > frequently.
> 
> Thank you for that patch. Can you say a little more about the domain
> of the problem?
> I.e., is it specific to invocations with "-w"?
> Can you provide an example that exhibits the performance improvement,
> with timings?

I can confirm that it seems to solve my problem (my actual use
case):

grep-3.3:
real    0m0.861s
user    0m0.784s
sys     0m0.076s

grep-3.6 with this patch:
real    0m1.052s
user    0m0.943s
sys     0m0.108s

grep-3.6 with both your patch and this one:

real    0m1.097s
user    0m1.037s
sys     0m0.060s

Apparently there were at least two issues here. While your patch
fixes the simplified case I reported and had no effect on my actual
use case, Norihiro Tanaka's patch fixes the latter (and has no big
effect on the former).

So it looks like both patches are indeed needed.

Thanks,
Frank





reply via email to

[Prev in Thread] Current Thread [Next in Thread]