help-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: help with regular expression formation


From: Bob Proulx
Subject: Re: help with regular expression formation
Date: Mon, 31 Mar 2008 21:24:51 -0600
User-agent: Mutt/1.5.13 (2006-08-11)

Mickey Ferguson wrote:
> I'm generating output from a grep command, which I then want to process in 
> grep again, filtering out my unwanted text.  In this specific example, I 
> want to filter out all lines that start with zero or more white space, 
> followed by the comment characters "//".  Here is what I thought I would 
> use:
> 
> grep StopProductServices *.rul *.h | grep ^\s*[^/]

Unless you have a grep that is using PCRE (perl compatible regular
expressions) then the above has three problems.  One is that \s is a
PCRE space pattern but not a normal regular expression.  Two is that
the * and brackets are shell metacharacters.  Those would need to be
quoted to protect them from shell expansion.  Three is that pattern
doesn't do what you want.  Try this:

  grep -v "^[[:space:]]*//"

The "[[:space:]]*" is a little long but is POSIX standard making it
preferred these days.  The old way was " *".

> The first grep obviously finds all occurrences of StopProductServices within 
> all *.rul and *.h files.  Then that output is piped into grep, with the 

Having two grep's in a pipeline works but usually the character I/O
between them is slower than combining them.  This is especially true
on MS where spawning multiple processes is exceptionally slow.  Plus
what you are doing is more suitable for sed than grep because sed will
report an error code if there is an error.  Grep reports whether there
was a match.  So in this case I would use sed and combine the
operations.  Plus I would attack the comment problem differently.  I
would simply remove them from the pattern space and remove any
whitespace ahead of it.  Try this:

  sed -n "s|[[:space:]]*//.*||;/StopProductServices/p" *.rul *.h

> To break it down a little, I first produced the output from the first grep, 
> which is used for the pipe:

You have to be extra charful about grep's into grep's.  Let me point
out why:

> ->grep StopProductServices *.rul *.h
> NTService.rul://    FUNCTION:  StopProductServices(sProduct)
> NTService.rul:// 03/24/08 MSF - Make StopProductServices() take an 

So far so good.  But...

> Then I ran the full command, and you can see that the output is not at all 
> what I expected:
> 
> [11:06:39]: *** C:\WIP\VESTA\Installer\Script Files ***
> ->grep StopProductServices *.rul *.h | grep ^\s*[^/]
> NTService.rul://    FUNCTION:  StopProductServices(sProduct)
> NTService.rul:// 03/24/08 MSF - Make StopProductServices() take an 

Here we see the problem.  The first grep read multiple input files.
Therefore it printed out the name of the input file as a prefix to the
pattern.  The second grep's "^" will anchor on the filename and not
the original line.  You would need to add -h to the first grep's
option list to suppress including the filename.  Second you would need
to fix the pattern and quote the pattern.

  grep -h StopProductServices *.rul *.h | grep -v "^[[:space:]]*//"

But I still recommend using sed.  I just wanted to comment about grep
into grep including the filename.

Hope that helps,
Bob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]