bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: -F fs_val handles backslash-newline differently, compared to -v FS=v


From: arnold
Subject: Re: -F fs_val handles backslash-newline differently, compared to -v FS=val and FS=val
Date: Thu, 08 Jun 2023 13:26:30 -0600
User-agent: Heirloom mailx 12.5 7/5/10

Thanks for the analysis. In my testing, different awks give different
results, but gawk is the worst in that it's not consistent whereas
the others I've tested at least give the same answer in all three
cases.

My inclination right now is that FS should have "\\\na" (backslash,
newline, "a") in it in all three cases.

I will eventually fix this, but as I said earlier, I have much other
stuff on my plate at the moment.

Thanks,

Arnold

"Andrew J. Schorr" <aschorr@telemetry-investments.com> wrote:

> It's an interesting case. Inside main.c:parse_args, the -F
> option sets a preassign of type PRE_ASSIGN_FS, whereas
> -v results in a generic PRE_ASSIGN. Then in main(), the
> PRE_ASSIGN_FS case results in a call to cmdline_fs instead
> of arg_assign. And cmdline_fs does minimal processing of the
> value. It checks for '\t', and then just calls
>
>         *tmp = make_str_node(str, strlen(str), SCAN); /* do process escapes */
>
> The arg_assign function is much more complicated. It has logic
> for disallowing newline in posix mode, and then it calls
>
>       it = make_str_node(cp, strlen(cp), SCAN | ELIDE_BACK_NL);
>
> Using the master branch:
>
> bash-5.1$ gawk -F '\
> a' 'BEGIN { print "FS1=" FS }'
> FS1=\a
> bash-5.1$ gawk --lint -F '\
> a' 'BEGIN { print "FS1=" FS }'
> gawk: warning: backslash string continuation is not portable
> FS1=\a
> bash-5.1$ gawk --lint --posix -F '\
> a' 'BEGIN { print "FS1=" FS }'
> gawk: warning: backslash string continuation is not portable
> FS1=\a
> bash-5.1$ gawk -v FS='\
> a' 'BEGIN { print "FS2=" FS }'
> FS2=a
> bash-5.1$ gawk --lint -v FS='\
> a' 'BEGIN { print "FS2=" FS }'
> gawk: warning: backslash string continuation is not portable
> FS2=a
> bash-5.1$ gawk --posix --lint -v FS='\
> a' 'BEGIN { print "FS2=" FS }'
> gawk: fatal: POSIX does not allow physical newlines in string values
>
> If one patgches cmdline_fs to add ELIDE_BACK_NL, then all 3 examples
> give the same result, but I have no idea whether that's the desirable
> outcome. The actual string argument processed contains backslash followed
> by newline followed by 'a'. Should that get mapped to 'a'?
>
> I also don't know if a -F arg containing a newline in posix mode
> should trigger that same fatal error.
>
> Regards,
> Andy
>
> On Thu, Jun 08, 2023 at 08:43:57AM -0600, arnold@skeeve.com wrote:
> > That is an interesting report. I will (eventually)
> > investigate; I don't have a lot of free time at the moment.
> > 
> > Thanks,
> > 
> > Arnold
> > 
> > Denys Vlasenko <dvlasenk@redhat.com> wrote:
> > 
> > > GNU awk 5.1.1
> > >
> > > gawk -F '\
> > > a' 'BEGIN { print "FS1=" FS }'
> > >
> > > gawk -v FS='\
> > > a' 'BEGIN { print "FS2=" FS }'
> > >
> > > echo | gawk '{ print "FS3=" FS }' FS='\
> > > a'
> > >
> > > The first command treats "backslash+newline" as backslash:
> > >
> > > FS1=\a
> > >
> > > The second and third commands treat the same as empty string:
> > >
> > > FS2=a
> > > FS3=a
> > >
> > > I think it would be better if all forms have the same rules.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]