bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#44754: Extreme performance degradation in GNU grep 3.4 / 3.6


From: Jim Meyering
Subject: bug#44754: Extreme performance degradation in GNU grep 3.4 / 3.6
Date: Sat, 5 Dec 2020 10:06:27 -0800

On Thu, Dec 3, 2020 at 12:26 AM Norihiro Tanaka <noritnk@kcn.ne.jp> wrote:
> On Thu, 26 Nov 2020 21:41:20 -1000
> Jim Meyering <jim@meyering.net> wrote:
>
> > On Thu, Nov 26, 2020 at 9:03 AM Jim Meyering <jim@meyering.net> wrote:
> > >
> > > On Wed, Nov 25, 2020 at 3:12 PM Jim Meyering <jim@meyering.net> wrote:
> > > > Thank you for the fine bug report.
> > > > The grep-3.6 bug you've exposed is due to the fact that your input
> > > > triggers excessive hash collisions when using the code modeled after
> > > > gnulib/lib/hash-pjw.c. That made the new pattern-preprocessing phase
> > > > take O(N^2) time for N patterns. In the attached, I've switched grep
> > > > to use the djb2 hash function, and that resolves the problem. I'll
> > > > also add a NEWS entry and a test before pushing this.
> > >
> > > Timings suggest that grep-3.6's preprocessing came closer to O(N^3).
> > > Here's an example that would take 2-3 days with grep-3.6 and only
> > > seconds with this fix:
> > >
> > >   : | grep -Ff <(seq 6400000 | tr 0-9 A-J)
> > >
> > > Here's a complete patch.
> > > I'll push it later today.
> >
> > Pushed along with two gnulib-related changes.
>
> The fix has improved some performance.  However, it's still quite slow
> compared to version 3.3, and that can be remedied.
>
> It converts to grep only if the potential match does not match the word
> frequently.

Thank you for that patch. Can you say a little more about the domain
of the problem?
I.e., is it specific to invocations with "-w"?
Can you provide an example that exhibits the performance improvement,
with timings?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]