bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: FPAT is not working as expected


From: Jannick
Subject: RE: FPAT is not working as expected
Date: Mon, 14 Dec 2020 21:24:15 +0100

On Mon, 14 Dec 2020 10:39:50 -0800, Arthur Schwarz wrote:
> Thanks Jannack;
> 
> Well no. This isn't a homework assignment. Not by a long, long shot (time). It
> is a newbie issue though.

Ok - glad to hear that it is not homework.

> Rather than create a new, more perfect world, I decided to copy the program
> in the manual as exactly as I could. What you see is basically what was
> written.

The regular expression in the manual is sub-ideal I would say. FPAT as given in 
my last email should work a lot better.
 
> And now to your code:
> 
> 1:    Removing  $i = substr($i, 2, len - 2); seems to have fixed the issue
>          of not correctly identifying the http:... URL. I don't understand
>          why this should happen, but all the URL's are correct.

As far as I understand gawk this is due to some gawk internals: Once you change 
a $i element, $0 is recomposed with $i being changed and then $0 get decomposed 
again using FPAT.  This is the reason why things get scrambled. 

> 2:    In all cases which there is a quoted string, an extra (empty) field
>          is found. This is consistent with previous results. I think that
>          this only applies to the last quoted string, not the first.
 
This is expected because FPAT allow empty tokens. So after seen the last comma 
on the line a new empty token is identified. This needs to be manually handled. 
 
> 3:    Embedded quotes ("") and commas (,) are correctly handled.
>          This is consistent with my last email.

Agree.
 
> 4:    Split does not work correctly (thanks for the narray=...).
>          It looks like a space, " ", is treated as a delimeter.

The delimiter (regex) can be given as parameter to split(), but you are using 
the default.   
 
> 5:    Substituting "patsplit" for "split" yields uniformly incorrect results.
>          No line with a quoted string is output correctly. "" and , are
>              treated as delimiters. All lines without "" and , are output
>              as a single field. Lines with "" or , are output as two fields.
>          All lines without a quoted string are output correctly.

Use a regular expression when calling patsplit.
 
> In summary (thanks again for you narray). Your changes fixed all the line
> processing output issues except for the addition of a null extra field and 
> that
> damnable formatting issue on the last output ("> <5:"). split() and patsplit()
> do not work as expected.

Well, they do exactly what they are expected to do. For your purpose split does 
not lead anywhere  I believe.  
 
> What I don't understand:
> 
> My environment: Win 7-64
>                                  cygcheck (cygwin) 3.1.7
> 
> Your output does not show an extra field when the last field in the input is
> quoted. My output does show an extra field.

This might be some fancy greeting of the EOL hell on Windows. If you put RS = 
"\r?\n"  this should go away.   
 
> Your output of the last field in record splitting shows correct output my
> version shows that damnable "> <5:".
> 
> Both your output of split() and my output of split() are the same which
> indicates a complete lack of understanding on my part or that split() does not
> work as advertised.

Well, the latter is very unlikely to be true.
 
> Thanks. I assure you that this is not a class project (and indecently) I am 
> not a
> student.

Again, good to hear, because I am not up to doing other ones' HOMEwork.  
 
> art

HTH,
J.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]