bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#38690: grep -P quadratic on long lines


From: Martin Raszyk
Subject: bug#38690: grep -P quadratic on long lines
Date: Fri, 20 Dec 2019 10:54:14 +0100

Dear grep maintainers,

I've realized that grep -P takes quadratic time on long lines with
many short matches. For instance, executing './src/grep -P -o "a"
a.txt > a.out' on a file 'a.txt' consisting of N characters 'a' takes
time quadratic in N. I've used grep-3.3 and pcre-8.43 for the
benchmarks.

The root causes for this behavior are as follows:
1. in src/pcresearch.c on l. 222 (at commit
cf09252295c554dd3eba4cdb8eb53911fb495f40), the end of the line is
searched each time a new match is searched; this already results in
quadratic runtime in the above mentioned case
2. the function 'pcre_exec' from pcre-8.43 called in src/pcresearch.c
on l. 71 for each match checks if the provided string is valid UTF-8
(code implemented in pcre_valid_utf8.c); this also results in
quadratic runtime

On your side, it is possible to fix the first root cause. I'll post an
e-mail to PCRE mailing list about the second root cause.

Best regards,
Martin





reply via email to

[Prev in Thread] Current Thread [Next in Thread]