bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: FPAT is not working as expected


From: Jannick
Subject: RE: FPAT is not working as expected
Date: Mon, 14 Dec 2020 18:18:14 +0100

On Mon, 14 Dec 2020 08:40:52 -0800, Arthur Schwarz wrote:
> It does a lot better than the previous version but there are still issues.
> 
> 1:    "line 1",,,"http://file.a/A%20Guide%20";
>          <4: http>       http is wrong
> 
> 2:    "line 2",,,"https://www.whitgt.pdf";
>          same issue as 1:
> 
> 3:    "line 3, and xyz",,,"http://www.c/main.pdf";
>          same issue as 1: but note that the embedded ',' is treated correctly
> 
> 4:    "line 4 "" and abc",,,http://file.a/A%20Guide%20
>          embedded "" treated correctly and http: recognized correctly
> 
> 5:    line 5,,,https://www.whitgt.pdf
>          all recognized correctly
> 
> 6:    line 6,,,http://file.a/A%20Guide%20
>          all recognized correctly
> 
> errata:
> 1:    All lines with a quoted string recognize an extra field
> 2:    The last output of all lines is incorrectly formatted:
>          ">  <5:" instead of "<5: >"  this may be a programming
>          error but I can't seem to locate it.
> 5:    split($0, array) is uniformly incorrect.
>          From the Gnu Awk manual FPAT is used as the regular expression
>          and there are words to the effect that the resultant split will be
>          the same as in normal input processing. This seems not to be
>          the case.

These changes against your original version work for me - or am I missing 
something? My output far below.

diff --git a/code.awk b/code.awk
--- a/code.awk
+++ b/code.awk
@@ -2,7 +2,7 @@
 
 BEGIN {                                         # program constants
         FS           = "~"
-        FPAT         = "([^,]*)|(\"([^\"]|\"\")\")" # CSV field separator #    
   FPAT          = /([^,]*)|("([^"]|"")")/      # CSV field separator
+        FPAT         = "(\"([^\"]|\"\")+\"|[^,\"]*)" # CSV field separator #   
    FPAT          = /([^,]*)|("([^"]|"")")/      # CSV field separator
         print "FPAT = ", FPAT;
 } # BEGIN
 {
@@ -10,17 +10,15 @@ BEGIN {                                         # program 
constants
        print $0;
        printf("%3d:   \n", NF);
        for (i = 1; i <= NF; i++) {
-          if (substr($i, 1, 1) == "\"") {
-             len = length($1)
-             $i = substr($i, 2, len - 2);
-          }
+          gsub(/(^"|"$)/,"",$i) # feasible given the knowledge of tokens 
matching FPAT
+          gsub(/""/,"\"",$i) # same
           printf("      <%d: %s>\n", i, $i);
        }
        print " ";
        print  "------------------- split ----------------------\n"
-       split($0, array);
+       narray=split($0, array); # cosmetic change
        printf(" NF ndx            array\n");
-       for (ndx = 1; ndx <= length(array); ndx++) {
+       for (ndx = 1; ndx <= narray; ndx++) {
           printf("%3d %3d %27s\n", NF, ndx, array[ndx]);
        }
 
Hoping this is not kind of homework.  For a newbie to gawk not bad at all. ;)

HTH. 



OUTPUT:

FPAT =  ("([^"]|"")+"|[^,"]*)
------------------------------------------------

"PDQ",,,
  4:   
      <1: PDQ>
      <2: >
      <3: >
      <4: >
 
------------------- split ----------------------

 NF ndx            array
  4   1                      PDQ   


------------------------------------------------

"line 1",,,"http://file.a/A%20Guide%20";
  4:   
      <1: line 1>
      <2: >
      <3: >
      <4: http://file.a/A%20Guide%20>
 
------------------- split ----------------------

 NF ndx            array
  4   1 line 1   http://file.a/A%20Guide%20


------------------------------------------------

"line 2",,,"https://www.whitgt.pdf";
  4:   
      <1: line 2>
      <2: >
      <3: >
      <4: https://www.whitgt.pdf>
 
------------------- split ----------------------

 NF ndx            array
  4   1 line 2   https://www.whitgt.pdf


------------------------------------------------

"line 3, and xyz",,,"http://www.c/main.pdf";
  4:   
      <1: line 3, and xyz>
      <2: >
      <3: >
      <4: http://www.c/main.pdf>
 
------------------- split ----------------------

 NF ndx            array
  4   1 line 3, and xyz   http://www.c/main.pdf


------------------------------------------------

"line 4 "" and abc",,,http://file.a/A%20Guide%20 line 
5,,,https://www.whitgt.pdf line 6,,,http://file.a/A%20Guide%20
 10:   
      <1: line 4 " and abc>
      <2: >
      <3: >
      <4: http://file.a/A%20Guide%20 line 5>
      <5: >
      <6: >
      <7: https://www.whitgt.pdf line 6>
      <8: >
      <9: >
      <10: http://file.a/A%20Guide%20>
 
------------------- split ----------------------

 NF ndx            array
 10   1 line 4 " and abc   http://file.a/A%20Guide%20 line 5   
https://www.whitgt.pdf line 6   http://file.a/A%20Guide%20

    




reply via email to

[Prev in Thread] Current Thread [Next in Thread]