[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: FPAT is not working as expected
From: |
Jannick |
Subject: |
RE: FPAT is not working as expected |
Date: |
Mon, 14 Dec 2020 18:18:14 +0100 |
On Mon, 14 Dec 2020 08:40:52 -0800, Arthur Schwarz wrote:
> It does a lot better than the previous version but there are still issues.
>
> 1: "line 1",,,"http://file.a/A%20Guide%20"
> <4: http> http is wrong
>
> 2: "line 2",,,"https://www.whitgt.pdf"
> same issue as 1:
>
> 3: "line 3, and xyz",,,"http://www.c/main.pdf"
> same issue as 1: but note that the embedded ',' is treated correctly
>
> 4: "line 4 "" and abc",,,http://file.a/A%20Guide%20
> embedded "" treated correctly and http: recognized correctly
>
> 5: line 5,,,https://www.whitgt.pdf
> all recognized correctly
>
> 6: line 6,,,http://file.a/A%20Guide%20
> all recognized correctly
>
> errata:
> 1: All lines with a quoted string recognize an extra field
> 2: The last output of all lines is incorrectly formatted:
> "> <5:" instead of "<5: >" this may be a programming
> error but I can't seem to locate it.
> 5: split($0, array) is uniformly incorrect.
> From the Gnu Awk manual FPAT is used as the regular expression
> and there are words to the effect that the resultant split will be
> the same as in normal input processing. This seems not to be
> the case.
These changes against your original version work for me - or am I missing
something? My output far below.
diff --git a/code.awk b/code.awk
--- a/code.awk
+++ b/code.awk
@@ -2,7 +2,7 @@
BEGIN { # program constants
FS = "~"
- FPAT = "([^,]*)|(\"([^\"]|\"\")\")" # CSV field separator #
FPAT = /([^,]*)|("([^"]|"")")/ # CSV field separator
+ FPAT = "(\"([^\"]|\"\")+\"|[^,\"]*)" # CSV field separator #
FPAT = /([^,]*)|("([^"]|"")")/ # CSV field separator
print "FPAT = ", FPAT;
} # BEGIN
{
@@ -10,17 +10,15 @@ BEGIN { # program
constants
print $0;
printf("%3d: \n", NF);
for (i = 1; i <= NF; i++) {
- if (substr($i, 1, 1) == "\"") {
- len = length($1)
- $i = substr($i, 2, len - 2);
- }
+ gsub(/(^"|"$)/,"",$i) # feasible given the knowledge of tokens
matching FPAT
+ gsub(/""/,"\"",$i) # same
printf(" <%d: %s>\n", i, $i);
}
print " ";
print "------------------- split ----------------------\n"
- split($0, array);
+ narray=split($0, array); # cosmetic change
printf(" NF ndx array\n");
- for (ndx = 1; ndx <= length(array); ndx++) {
+ for (ndx = 1; ndx <= narray; ndx++) {
printf("%3d %3d %27s\n", NF, ndx, array[ndx]);
}
Hoping this is not kind of homework. For a newbie to gawk not bad at all. ;)
HTH.
OUTPUT:
FPAT = ("([^"]|"")+"|[^,"]*)
------------------------------------------------
"PDQ",,,
4:
<1: PDQ>
<2: >
<3: >
<4: >
------------------- split ----------------------
NF ndx array
4 1 PDQ
------------------------------------------------
"line 1",,,"http://file.a/A%20Guide%20"
4:
<1: line 1>
<2: >
<3: >
<4: http://file.a/A%20Guide%20>
------------------- split ----------------------
NF ndx array
4 1 line 1 http://file.a/A%20Guide%20
------------------------------------------------
"line 2",,,"https://www.whitgt.pdf"
4:
<1: line 2>
<2: >
<3: >
<4: https://www.whitgt.pdf>
------------------- split ----------------------
NF ndx array
4 1 line 2 https://www.whitgt.pdf
------------------------------------------------
"line 3, and xyz",,,"http://www.c/main.pdf"
4:
<1: line 3, and xyz>
<2: >
<3: >
<4: http://www.c/main.pdf>
------------------- split ----------------------
NF ndx array
4 1 line 3, and xyz http://www.c/main.pdf
------------------------------------------------
"line 4 "" and abc",,,http://file.a/A%20Guide%20 line
5,,,https://www.whitgt.pdf line 6,,,http://file.a/A%20Guide%20
10:
<1: line 4 " and abc>
<2: >
<3: >
<4: http://file.a/A%20Guide%20 line 5>
<5: >
<6: >
<7: https://www.whitgt.pdf line 6>
<8: >
<9: >
<10: http://file.a/A%20Guide%20>
------------------- split ----------------------
NF ndx array
10 1 line 4 " and abc http://file.a/A%20Guide%20 line 5
https://www.whitgt.pdf line 6 http://file.a/A%20Guide%20
Re: FPAT is not working as expected, Manuel Collado, 2020/12/14
Re: FPAT is not working as expected, Manuel Collado, 2020/12/20
- Re: FPAT is not working as expected, arnold, 2020/12/21
- Re: FPAT is not working as expected, Andrew J. Schorr, 2020/12/21
- Re: FPAT is not working as expected, Manuel Collado, 2020/12/21
- Re: FPAT is not working as expected, Andrew J. Schorr, 2020/12/21
- Re: FPAT is not working as expected, Manuel Collado, 2020/12/21
- Re: CSV extension, Andrew J. Schorr, 2020/12/21