[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: FPAT is not working as expected
From: |
Jannick |
Subject: |
RE: FPAT is not working as expected |
Date: |
Mon, 14 Dec 2020 21:24:15 +0100 |
On Mon, 14 Dec 2020 10:39:50 -0800, Arthur Schwarz wrote:
> Thanks Jannack;
>
> Well no. This isn't a homework assignment. Not by a long, long shot (time). It
> is a newbie issue though.
Ok - glad to hear that it is not homework.
> Rather than create a new, more perfect world, I decided to copy the program
> in the manual as exactly as I could. What you see is basically what was
> written.
The regular expression in the manual is sub-ideal I would say. FPAT as given in
my last email should work a lot better.
> And now to your code:
>
> 1: Removing $i = substr($i, 2, len - 2); seems to have fixed the issue
> of not correctly identifying the http:... URL. I don't understand
> why this should happen, but all the URL's are correct.
As far as I understand gawk this is due to some gawk internals: Once you change
a $i element, $0 is recomposed with $i being changed and then $0 get decomposed
again using FPAT. This is the reason why things get scrambled.
> 2: In all cases which there is a quoted string, an extra (empty) field
> is found. This is consistent with previous results. I think that
> this only applies to the last quoted string, not the first.
This is expected because FPAT allow empty tokens. So after seen the last comma
on the line a new empty token is identified. This needs to be manually handled.
> 3: Embedded quotes ("") and commas (,) are correctly handled.
> This is consistent with my last email.
Agree.
> 4: Split does not work correctly (thanks for the narray=...).
> It looks like a space, " ", is treated as a delimeter.
The delimiter (regex) can be given as parameter to split(), but you are using
the default.
> 5: Substituting "patsplit" for "split" yields uniformly incorrect results.
> No line with a quoted string is output correctly. "" and , are
> treated as delimiters. All lines without "" and , are output
> as a single field. Lines with "" or , are output as two fields.
> All lines without a quoted string are output correctly.
Use a regular expression when calling patsplit.
> In summary (thanks again for you narray). Your changes fixed all the line
> processing output issues except for the addition of a null extra field and
> that
> damnable formatting issue on the last output ("> <5:"). split() and patsplit()
> do not work as expected.
Well, they do exactly what they are expected to do. For your purpose split does
not lead anywhere I believe.
> What I don't understand:
>
> My environment: Win 7-64
> cygcheck (cygwin) 3.1.7
>
> Your output does not show an extra field when the last field in the input is
> quoted. My output does show an extra field.
This might be some fancy greeting of the EOL hell on Windows. If you put RS =
"\r?\n" this should go away.
> Your output of the last field in record splitting shows correct output my
> version shows that damnable "> <5:".
>
> Both your output of split() and my output of split() are the same which
> indicates a complete lack of understanding on my part or that split() does not
> work as advertised.
Well, the latter is very unlikely to be true.
> Thanks. I assure you that this is not a class project (and indecently) I am
> not a
> student.
Again, good to hear, because I am not up to doing other ones' HOMEwork.
> art
HTH,
J.
Re: FPAT is not working as expected, Manuel Collado, 2020/12/14
Re: FPAT is not working as expected, Manuel Collado, 2020/12/20
- Re: FPAT is not working as expected, arnold, 2020/12/21
- Re: FPAT is not working as expected, Andrew J. Schorr, 2020/12/21
- Re: FPAT is not working as expected, Manuel Collado, 2020/12/21
- Re: FPAT is not working as expected, Andrew J. Schorr, 2020/12/21
- Re: FPAT is not working as expected, Manuel Collado, 2020/12/21
- Re: CSV extension, Andrew J. Schorr, 2020/12/21