bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FPAT is not working as expected


From: Manuel Collado
Subject: Re: FPAT is not working as expected
Date: Sun, 20 Dec 2020 23:05:10 +0100
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0

Sorry for the late response. See below.

El 14/12/2020 a las 2:11, Arthur Schwarz escribió:
I am trying to separate csv fields using the FPAT given in the Gnu Awk
manual, a.k.a, "Gawk: Effective AWK Programming", Section 4.7.1 FPAT.
I'm new in using gawk but have seen the following issues:

1:    FPAT = /<pattern>/ does not seem to work. Only FPAT = "pattern"
seems to sort-of work.

2:    I have an HTTP field. Input field splitting seems to fail on it
sometimes.

3:    Embedded comma's are recognized as field separators.

4:    Each field output is demarkated by a '<', '>'. This works as
expected except on the last field.
        All output fields should look like: "   <#: field>", sometimes
for the last field the
        output looks like ">   <#:field".  An embedded line field would
look like:
        "     <#:field"
       ">"

5:  Using 'split($0, array)' does not detect the same fields as 'normal'
field processing.

I have tried this w/wo changing FS with no change. I have tried this
using FPAT given in Section 4.7 of the Gnu Awk manual with no change. I
have tried' patsplit()' and get the similar results as split(). I don't
know what else to try.

Now I'm the first to say I have no idea what's going on. Could you
please tell me what I'm missing?

The sample code, test case and output are below:

You can avoid the burden of composing a working FPAT and removing the quoting of the fields by using the CSVMODE gawk library available at

    http://mcollado.z15.es/xgawk/

Your example can be rewritten as follows.

==================================================
$ cat testcsv.awk
@include "csvmode"

BEGIN {
    CSVMODE = -1
}

FNR==1 {
    print "------------------------------------------------"
}

{
    print CSVRECORD
    print "NF = " NF
    for (i = 1; i <= NF; i++) {
        printf("      <%d: %s>\n", i, $i)
    }
    print "------------------------------------------------"
}

====================================================
$ gawk -f testcsv.awk testfpat.csv
------------------------------------------------
"PDQ",,,
NF = 4
      <1: PDQ>
      <2: >
      <3: >
      <4: >
------------------------------------------------
"line 1",,,"http://file.a/A%20Guide%20";
NF = 4
      <1: line 1>
      <2: >
      <3: >
      <4: http://file.a/A%20Guide%20>
------------------------------------------------
"line 2",,,"https://www.whitgt.pdf";
NF = 4
      <1: line 2>
      <2: >
      <3: >
      <4: https://www.whitgt.pdf>
------------------------------------------------
"line 3, and xyz",,,"http://www.c/main.pdf";
NF = 4
      <1: line 3, and xyz>
      <2: >
      <3: >
      <4: http://www.c/main.pdf>
------------------------------------------------
"line 4 "" and abc",,,http://file.a/A%20Guide%20
NF = 4
      <1: line 4 " and abc>
      <2: >
      <3: >
      <4: http://file.a/A%20Guide%20>
------------------------------------------------
line 5,,,https://www.whitgt.pdf
NF = 4
      <1: line 5>
      <2: >
      <3: >
      <4: https://www.whitgt.pdf>
------------------------------------------------
line 6,,,http://file.a/A%20Guide%20
NF = 4
      <1: line 6>
      <2: >
      <3: >
      <4: http://file.a/A%20Guide%20>
------------------------------------------------

=========================================================

HTH. Regards.

--
Manuel Collado - http://mcollado.z15.es



reply via email to

[Prev in Thread] Current Thread [Next in Thread]