bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FPAT is not working as expected


From: Arthur Schwarz
Subject: FPAT is not working as expected
Date: Sun, 13 Dec 2020 17:11:50 -0800
User-agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.5.1

I am trying to separate csv fields using the FPAT given in the Gnu Awk manual, a.k.a, "Gawk: Effective AWK Programming", Section 4.7.1 FPAT. I'm new in using gawk but have seen the following issues:

1:    FPAT = /<pattern>/ does not seem to work. Only FPAT = "pattern" seems to sort-of work.

2:    I have an HTTP field. Input field splitting seems to fail on it sometimes.

3:    Embedded comma's are recognized as field separators.

4:    Each field output is demarkated by a '<', '>'. This works as expected except on the last field.        All output fields should look like: "   <#: field>", sometimes for the last field the        output looks like ">   <#:field".  An embedded line field would look like:
       "     <#:field"
      ">"

5:  Using 'split($0, array)' does not detect the same fields as 'normal' field processing.

I have tried this w/wo changing FS with no change. I have tried this using FPAT given in Section 4.7 of the Gnu Awk manual with no change. I have tried' patsplit()' and get the similar results as split(). I don't know what else to try.

Now I'm the first to say I have no idea what's going on. Could you please tell me what I'm missing?

The sample code, test case and output are below:

--------------------------------- CODE ---------------------------------

#! /bin/gawk  -f

BEGIN {                                         # program constants
       FS           = "~"
       FPAT         = "([^,]*)|(\"([^\"]|\"\")\")" # CSV field separator
#       FPAT          = /([^,]*)|("([^"]|"")")/      # CSV field separator
       print "FPAT = ", FPAT;
} # BEGIN
{
      print "------------------------------------------------\n"
      print $0;
      printf("%3d:   \n", NF);
      for (i = 1; i <= NF; i++) {
         if (substr($i, 1, 1) == "\"") {
            len = length($1)
            $i = substr($i, 2, len - 2);
         }
         printf("      <%d: %s>\n", i, $i);
      }
      print " ";
      print  "------------------- split ----------------------\n"
      split($0, array);
      printf(" NF ndx            array\n");
      for (ndx = 1; ndx <= length(array); ndx++) {
         printf("%3d %3d %27s\n", NF, ndx, array[ndx]);
      }

      print "\n";
}

------------------------------ TEST CASE -------------------------------

"PDQ",,,
"line 1",,,"http://file.a/A%20Guide%20";
"line 2",,,"https://www.whitgt.pdf";
"line 3, and xyz",,,"http://www.c/main.pdf";
"line 4 "" and abc",,,http://file.a/A%20Guide%20
line 5,,,https://www.whitgt.pdf
line 6,,,http://file.a/A%20Guide%20

-------------------------------- OUTPUT --------------------------------

FPAT =  ([^,]*)|("([^"]|"")")
------------------------------------------------

"PDQ",,,
  4:
      <1: PDQ>
      <2: >
      <3: >
      <4:
>

------------------- split ----------------------

 NF ndx            array
  4   1                     PDQ


------------------------------------------------

"line 1",,,"http://file.a/A%20Guide%20";
  4:
      <1: line 1>
      <2: >
      <3: >
      <4: http>

------------------- split ----------------------

 NF ndx            array
  4   1               line 1   http


------------------------------------------------

"line 2",,,"https://www.whitgt.pdf";
  4:
      <1: line 2>
      <2: >
      <3: >
      <4: http>

------------------- split ----------------------

 NF ndx            array
  4   1               line 2   http


------------------------------------------------

"line 3, and xyz",,,"http://www.c/main.pdf";
  5:
      <1: line >
      <2:  and xyz">
      <3: >
      <4: >
      <5: htt>

------------------- split ----------------------

 NF ndx            array
  5   1       line   and xyz"   htt


------------------------------------------------

"line 4 "" and abc",,,http://file.a/A%20Guide%20
  4:
      <1: line 4 "" and abc>
      <2: >
      <3: >
      <4: http://file.a/A%20Guide%20
>

------------------- split ----------------------

 NF ndx            array
  4   1 line 4 "" and abc   http://file.a/A%20Guide%20


------------------------------------------------

line 5,,,https://www.whitgt.pdf
  4:
      <1: line 5>
      <2: >
      <3: >
      <4: https://www.whitgt.pdf
>

------------------- split ----------------------

 NF ndx            array
  4   1 line 5,,,https://www.whitgt.pdf


------------------------------------------------

line 6,,,http://file.a/A%20Guide%20
  4:
      <1: line 6>
      <2: >
      <3: >
      <4: http://file.a/A%20Guide%20
>

------------------- split ----------------------

 NF ndx            array
  4   1 line 6,,,http://file.a/A%20Guide%20






reply via email to

[Prev in Thread] Current Thread [Next in Thread]