bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#37754: wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪


From: Trent W. Buck
Subject: bug#37754: wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪Z union)
Date: Fri, 18 Oct 2019 22:49:23 +1100
User-agent: Mutt/1.10.1 (2018-07-13)

Paul Eggert wrote:
> On 10/16/19 5:19 PM, Trent W. Buck wrote:
> > I would expect "grep -Fw -e 4GB -e DDR4 --and" to print the same thing as
> >
> >      grep -Fw 4GB | grep -Fw DDR4 | grep -Fw -e 4GB -e DDR4 -o
>
> You're right, it's not obvious. :-)
>
> It may be better to just pipe greps together, as you do now. That's simple
> and fast enough for this relatively-uncommon case, and it's portable to all
> greps.

I admit that most of the time, I want "grep --and" for a small dataset
(<1MB computer_parts.txt), so it's merely a convenience.

Sometimes I grep audit logs (~1TB uncompressed), which takes anywhere
from 15 minutes to 3 days, depending on how I tweak my grep calls.

In that case, each grep in the pipeline has to pay the costs to
de-serialize input from the previous grep, and re-serialize output to
the next grep.  If the first grep matches (say) 200GB of the 1TB,
that's can be a lot of overhead (I assume).

I was basically hoping that if it was all in a single grep process,
the de/serialization steps could be skipped completely.
I think the buzzword for that is "zero-copy"?

I've noticed "grep" is about 30% slower than either "grep -F" or
"LC_COLLATE=C grep", because (I think) it avoids the costs of decoding
from UTF-8 to Unicode and back.  So I was basically expecting a
similar saving from --and.

I'm only speaking as an end user - I haven't dug through the grep
source, so those expectations might be unrealistic, and implementing
it might be painful/impossible.  I figured I should at least ask :-)

If your expert opinion is that it's a pain to implement (and
maintain!) and there's not enough demand, then I'm OK with that.
This is NOT something that's burning me every day.

Regardless, I appreciate you taking the time to discuss it. :-)


PS: Regarding portability, I'm personally not worried because when I
need a GNUism badly enough (e.g. du --threshold), I can usually get
permission to install the relevant GNU software, even if it's only
into %APPDATA% or $HOME.

PS: I noticed on bugs.gnu.org something about grep being
single-threaded, which might mean "grep --and" would end up being
SLOWER than the existing pipelines, since the kernel can distribute
a pipeline's elements across multiple CPUs/cores.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]