bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20526: grep BUG: text file is detected as binary


From: Paul Eggert
Subject: bug#20526: grep BUG: text file is detected as binary
Date: Thu, 31 Dec 2015 01:29:35 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0

Jim Meyering wrote:
The combination of this and the grep -oP infloop fix make this look
like a good time for a bug-fix release. If there are any other pending
bug fixes or small+safe changes people would like to see included,
please let us know.

I have one major qualm about this: since 'grep' no longer checks whether the input is correctly encoded, I expect this may hurt -P performance significantly (though it may help non -P performance). This is because PCRE is slow at checking whether input data are valid UTF-8. I just now did a brief check and found one major performance issue:

grep -rP 'fed.*cba' .

On my machine the above command is 125x slower with the new grep than the old one, which suggests some tuning is in order before releasing. (It's bogged down inside libpcre somewhere.)

Since you wrote your email I did a triage of the outstanding bugs, except for the bugs where patches are available which are mostly performance-related, and where I expect there will be some stuff that is relevant to -P slowdown.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]