[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: FPAT is not working as expected
From: |
Manuel Collado |
Subject: |
Re: FPAT is not working as expected |
Date: |
Mon, 14 Dec 2020 17:03:57 +0100 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 |
El 14/12/2020 a las 2:11, Arthur Schwarz escribió:
I am trying to separate csv fields using the FPAT given in the Gnu Awk
manual, a.k.a, "Gawk: Effective AWK Programming", Section 4.7.1 FPAT.
I'm new in using gawk
Welcome to the gawk community.
but have seen the following issues:
1: FPAT = /<pattern>/ does not seem to work. Only FPAT = "pattern"
seems to sort-of work.
2: I have an HTTP field. Input field splitting seems to fail on it
sometimes.
3: Embedded comma's are recognized as field separators.
4: Each field output is demarkated by a '<', '>'. This works as
expected except on the last field.
All output fields should look like: " <#: field>", sometimes
for the last field the
output looks like "> <#:field". An embedded line field would
look like:
" <#:field"
">"
5: Using 'split($0, array)' does not detect the same fields as 'normal'
field processing.
I have tried this w/wo changing FS with no change. I have tried this
using FPAT given in Section 4.7 of the Gnu Awk manual with no change. I
have tried' patsplit()' and get the similar results as split(). I don't
know what else to try.
Now I'm the first to say I have no idea what's going on. Could you
please tell me what I'm missing?
Please see what follows.
The sample code, test case and output are below:
--------------------------------- CODE ---------------------------------
#! /bin/gawk -f
BEGIN { # program constants
FS = "~"
FPAT = "([^,]*)|(\"([^\"]|\"\")\")" # CSV field separator
Section 4.7.1 suggest either
"([^,]*)|(\"[^\"]+\")"
or
"([^,]*)|(\"([^\"]|\"\")+\")"
# FPAT = /([^,]*)|("([^"]|"")")/ # CSV field separator
print "FPAT = ", FPAT;
} # BEGIN
{
print "------------------------------------------------\n"
print $0;
printf("%3d: \n", NF);
for (i = 1; i <= NF; i++) {
if (substr($i, 1, 1) == "\"") {
len = length($1)
Should be len = length($i)
$i = substr($i, 2, len - 2);
}
printf(" <%d: %s>\n", i, $i);
}
print " ";
print "------------------- split ----------------------\n"
split($0, array);
Should be
patsplit($0, array);
printf(" NF ndx array\n");
for (ndx = 1; ndx <= length(array); ndx++) {
printf("%3d %3d %27s\n", NF, ndx, array[ndx]);
}
print "\n";
}
With those changes the test output is
FPAT = ([^,]*)|("([^"]|"")+")
------------------------------------------------
"PDQ",,,
4:
<1: PDQ>
<2: >
<3: >
<4: >
------------------- split ----------------------
NF ndx array
4 1 PDQ
------------------------------------------------
"line 1",,,"http://file.a/A%20Guide%20"
4:
<1: line 1>
<2: >
<3: >
<4: http://file.a/A%20Guide%20>
------------------- split ----------------------
NF ndx array
4 1 line 1 http://file.a/A%20Guide%20
------------------------------------------------
"line 2",,,"https://www.whitgt.pdf"
4:
<1: line 2>
<2: >
<3: >
<4: https://www.whitgt.pdf>
------------------- split ----------------------
NF ndx array
4 1 line 2 https://www.whitgt.pdf
------------------------------------------------
"line 3, and xyz",,,"http://www.c/main.pdf"
4:
<1: line 3, and xyz>
<2: >
<3: >
<4: http://www.c/main.pdf>
------------------- split ----------------------
NF ndx array
4 1 line 3
4 2 and xyz http://www.c/main.pdf
------------------------------------------------
"line 4 "" and abc",,,http://file.a/A%20Guide%20
4:
<1: line 4 "" and abc>
<2: >
<3: >
<4: http://file.a/A%20Guide%20>
------------------- split ----------------------
NF ndx array
4 1 line 4 "" and abc http://file.a/A%20Guide%20
------------------------------------------------
line 5,,,https://www.whitgt.pdf
4:
<1: line 5>
<2: >
<3: >
<4: https://www.whitgt.pdf>
------------------- split ----------------------
NF ndx array
4 1 line 5
4 2
4 3
4 4 https://www.whitgt.pdf
------------------------------------------------
line 6,,,http://file.a/A%20Guide%20
4:
<1: line 6>
<2: >
<3: >
<4: http://file.a/A%20Guide%20>
------------------- split ----------------------
NF ndx array
4 1 line 6
4 2
4 3
4 4 http://file.a/A%20Guide%20
HTH. Regards.
--
Manuel Collado - http://mcollado.z15.es
Re: FPAT is not working as expected,
Manuel Collado <=
Re: FPAT is not working as expected, Manuel Collado, 2020/12/20
- Re: FPAT is not working as expected, arnold, 2020/12/21
- Re: FPAT is not working as expected, Andrew J. Schorr, 2020/12/21
- Re: FPAT is not working as expected, Manuel Collado, 2020/12/21
- Re: FPAT is not working as expected, Andrew J. Schorr, 2020/12/21
- Re: FPAT is not working as expected, Manuel Collado, 2020/12/21
- Re: CSV extension, Andrew J. Schorr, 2020/12/21