[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#22028: grep -Pc / grep -P | wc -l inconsistent results
From: |
Jaroslav Skarvada |
Subject: |
bug#22028: grep -Pc / grep -P | wc -l inconsistent results |
Date: |
Fri, 27 Nov 2015 06:29:31 -0500 (EST) |
Hi,
it seems for long files which starts with non binary data and if PCRE matcher
is used, grep works in TEXTBIN_UNKNOWN mode until it finds binary data, then it
switches to TEXTBIN_BINARY. But in -Pc mode in TEXTBIN_BINARY it exits
on next match causing bogus -Pc results.
Reproducer:
$ grep -P -c 'Blocked by (SpamAssassin|Spamfilter)' ./filtered.txt
1
$ grep -P 'Blocked by (SpamAssassin|Spamfilter)' ./filtered.txt | wc -l
2
The ./filtered.txt is long enough text file, that contains some NULLs after the
first 32kB text, e.g. https://bugzilla.redhat.com/attachment.cgi?id=1080646
Original downstream bugzilla:
https://bugzilla.redhat.com/attachment.cgi?id=1080646
Attached is my attempt to fix it, but it may be not the right way
how to fix it. Especially the question is whether it should stop when
it finds binary data or not. But at least the grep -Pc / grep -P | wc -l
should behave the same
thanks & regards
Jaroslav
0001-grep-do-not-stop-on-binary-data-if-counting-in-PCRE.patch
Description: Text Data
- bug#22028: grep -Pc / grep -P | wc -l inconsistent results,
Jaroslav Skarvada <=