help-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: problem with GAWK/gsub when substitute new lines


From: Ralf Wildenhues
Subject: Re: problem with GAWK/gsub when substitute new lines
Date: Thu, 18 Jun 2009 07:49:05 +0200
User-agent: Mutt/1.5.20 (2009-06-15)

Hello Rita,

* shaledova wrote on Thu, Jun 18, 2009 at 05:26:44AM CEST:
> 
> I tried to use gawk to perform some text conversions. But I could not
> substitute new lines (\n) using gsub such as:
> gsub(/\[[\n]*\]/, "");
> 
> For example, if I have a file containing:
> <"Week Report" [
> 
> 
> 
> ]>
> 
> I want to convert these lines to:
> <"Week Report">
> 
> What is wrong with the expression?

The expression is ok, but gawk operates on each line in turn by default;
more specifically, the implicit loop is over records, with RS being the
record separator, which is a newline by default.  With something like
  awk 'BEGIN { RS="X" }
       { gsub(/\[[\n]*\]/, ""); print }'

you can get the above input to turn into
  <"Week Report" >

(note also the space before the closing > that was noto matched).

Of course, this is a kludge and requires your input to not contain X;
and you might have to adjust the output record separator ORS as well.

However, when parsing nested structures, regular expressions are
generally not the right tool.  You might be better off writing a small
state machine that reads the file line by line and just skips printing
output when inside unwanted [ ] brackets.

Hope that helps.

Cheers,
Ralf




reply via email to

[Prev in Thread] Current Thread [Next in Thread]