bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: succinct awk code to check for paired parantheses?


From: Tom Gray
Subject: RE: succinct awk code to check for paired parantheses?
Date: Thu, 25 Mar 2021 19:25:47 +0000

Here is how I do such things in gawk.
It uses the stack approach to keep track of the nesting levels.
patsplit() is effective for tokenizing.
Associative arrays are used for data structures.
All functionality is in functions so you can stay out of the global namespace.
No, it's not a one liner, but someone who's never seen awk could read it and 
understand it.
Bonus, you get some reusable modules.

Tom

# gawk -v debug=0 -f pcheck.awk

BEGIN {
   if(debug=="") debug=0;
   # do as little as possible in BEGIN{} . This is the global namespace.
   pcheck_test();
}

function pcheck_test(){
   pcheck("the {quick (brown|blue) [fox]}.");
   pcheck("bad parens( ( )");
   pcheck("bad parens())");
   pcheck("ignore quoted strings: x[i]=\"(((((]]]]]\"");
   pcheck("func foo{x[i]=((1+2)*6)/pi;}");
}

function pcheck(str           , stk, toks, groupers, closers, i){
   # declare some arrays
   stk[""];
   toks[""];

   # define toks that open a nesting level
   # and the corresponding tok that closes it
   groupers["("] = ")";
   groupers["["] = "]";
   groupers["{"] = "}";
   for(i in groupers) closers[groupers[i]] = i;

   tokenize(toks, str);

   # Use sorted_in to iterate the toks in order.
   # You could also store the number of toks in the toks[] structure
   # and use a more familiar, faster, for(;;) loop. 
   PROCINFO["sorted_in"] = "@ind_num_asc";
   for(i in toks){
      if(toks[i] in groupers) stack_push(stk, toks[i]);
      if(toks[i] in closers)  stack_pop(stk, closers[toks[i]]);
      if("error" in stk ) break;
     }

   if(debug) walk_array(stk, "stk");

   if("error" in stk){
      print "ERROR: grouping:", stk["error"], "\n   String was:", str;
      return 1;
     }

   # do not assume array members exist. make sure we end up with a number on 
the left side
   if(stk["depth"] * 1 > 0) {
      print "ERROR: grouping: unmatched", stk[stk["depth"]], "\n   String 
was:", str;
      return 1;
     }
   
   print "OK: String was:", str;
   return 0;
}

function tokenize(tok, str        ,sep, n, i){
   # grab quoted strings and grouping characters: ()[]{}
   # everything else goes into sep[]
   n=patsplit(str, tok, @/("[^""]+")|([[\]{}()])/, sep);

   if(debug) walk_array(tok, "tok");
   if(debug) walk_array(sep, "sep");
}

function stack_push(s, what            , d){
   d = s["depth"] * 1;
   d = d + 1;
   s[d] = what;
   s["depth"] = d;
}

function stack_pop(s,    expect           , d){
   d = s["depth"] * 1;
   if(d == 0) {
      s["error"] = "tried to pop " expect " but stack is empty";
      return "";
     }
   if(expect != ""){
      if(s[d] != expect) {
         s["error"] = "tried to pop " expect " from stack: Got " s[d];
        }
     }
   s["depth"] = d - 1;
   return s[d];
}

function walk_array(arr, name, outfile, i) {
   if(!(isarray(arr))) {
      if(arr=="") {
         if(name) name = name " = ";
         print name "null";
         return;
        }
      if(arr in mem) {
         if(name=="") return walk_array(mem[arr], "mem["arr"]", outfile);
         return walk_array(mem[arr], name, outfile);
        }
      print "arr="arr;
      return;
     }

   for(i in arr) {
      if(isarray(arr[i])) {
         walk_array(arr[i], (name "[ "i" ]"), outfile)
        }
      else {
         if(outfile!="") {
            printf("%s[ %s ] = %s\n", name, i, arr[i]) > outfile
           }
         else {
            printf("%s[ %s ] = %s\n", name, i, arr[i])
           }
        }
     }
}



-----Original Message-----
From: bug-gawk <bug-gawk-bounces+tom_gray=keysight.com@gnu.org> On Behalf Of 
Jakub Martisko
Sent: Thursday, March 25, 2021 6:47 AM
To: david kerns <david.t.kerns@gmail.com>
Cc: bug-gawk <bug-gawk@gnu.org>
Subject: Re: succinct awk code to check for paired parantheses?

CAUTION: This message originates from an external sender.

Indeed, you need context free grammers to parse parentheses properly.
Basic regexps are weaker than that. The easiest way to parse them (I am not 
providing awk code, because it would be a mess imho) is to use stack structure 
and then parse the input string symbol after symbol. If the current symbol is 
'(' push it to the stack, if it is ')', check that the stack contains matching 
'(' and pop it from the stack, if you get to the end of the string and the 
stack is empty at that point, the string passed.

For this, yacc would be even better option than lex.

Alternative approach is using a counter, increase it when the input is '(', 
decrease is when the input is ')', if you ever get to negatives, or parse the 
whole string and the counter ends with non-zero value, the string fails the 
test.


On Thu, Mar 25, 2021 at 06:26:23AM -0700, david kerns wrote:
> I submit the following test cases that fail the proposed awk solutions:
>
> #1
>    if ((a>b) && (b>c) && // )
>      ((c>d)) {
>       a = 0;
>    }
>
> #2
>    if ) 5 ( b = 0;
>
> I stand by my original post, this is a job for lex
>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]