bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: -F fs_val handles backslash-newline differently, compared to -v FS=v


From: arnold
Subject: Re: -F fs_val handles backslash-newline differently, compared to -v FS=val and FS=val
Date: Sun, 11 Jun 2023 02:46:25 -0600
User-agent: Heirloom mailx 12.5 7/5/10

Hi.

I have investigated some more.  This has to do with a wish to use common
code to process escape sequences, and it's falling afoul of gawk's
ablity to use C-style backslash continuation inside strings.

If we think of -Fxxx as `FS = "xxx"' (and same with -v), then

BEGIN {
        FS = "\
a"
}

assigns "a" to FS.  Thus my earlier conclusion to assign "\\\na" was
not really correct.  Here is the fix. I will be updating the manual
about this as well, and adding some tests to the test suite.

As mentioned, different awks treat this differently, it's a real
dark corner. And I did find one other awk that is also inconsistent;
I reported it to the author privately.

Thanks,

Arnold
---------------
diff --git a/main.c b/main.c
index c48feafa..e660da17 100644
--- a/main.c
+++ b/main.c
@@ -791,7 +791,7 @@ cmdline_fs(char *str)
                        str[0] = '\t';
        }
 
-       *tmp = make_str_node(str, strlen(str), SCAN); /* do process escapes */
+       *tmp = make_str_node(str, strlen(str), SCAN | ELIDE_BACK_NL); /* do 
process escapes */
        set_FS();
 }

arnold@skeeve.com wrote:

> Thanks for the analysis. In my testing, different awks give different
> results, but gawk is the worst in that it's not consistent whereas
> the others I've tested at least give the same answer in all three
> cases.
>
> My inclination right now is that FS should have "\\\na" (backslash,
> newline, "a") in it in all three cases.
>
> I will eventually fix this, but as I said earlier, I have much other
> stuff on my plate at the moment.
>
> Thanks,
>
> Arnold
>
> "Andrew J. Schorr" <aschorr@telemetry-investments.com> wrote:
>
> > It's an interesting case. Inside main.c:parse_args, the -F
> > option sets a preassign of type PRE_ASSIGN_FS, whereas
> > -v results in a generic PRE_ASSIGN. Then in main(), the
> > PRE_ASSIGN_FS case results in a call to cmdline_fs instead
> > of arg_assign. And cmdline_fs does minimal processing of the
> > value. It checks for '\t', and then just calls
> >
> >         *tmp = make_str_node(str, strlen(str), SCAN); /* do process escapes 
> > */
> >
> > The arg_assign function is much more complicated. It has logic
> > for disallowing newline in posix mode, and then it calls
> >
> >     it = make_str_node(cp, strlen(cp), SCAN | ELIDE_BACK_NL);
> >
> > Using the master branch:
> >
> > bash-5.1$ gawk -F '\
> > a' 'BEGIN { print "FS1=" FS }'
> > FS1=\a
> > bash-5.1$ gawk --lint -F '\
> > a' 'BEGIN { print "FS1=" FS }'
> > gawk: warning: backslash string continuation is not portable
> > FS1=\a
> > bash-5.1$ gawk --lint --posix -F '\
> > a' 'BEGIN { print "FS1=" FS }'
> > gawk: warning: backslash string continuation is not portable
> > FS1=\a
> > bash-5.1$ gawk -v FS='\
> > a' 'BEGIN { print "FS2=" FS }'
> > FS2=a
> > bash-5.1$ gawk --lint -v FS='\
> > a' 'BEGIN { print "FS2=" FS }'
> > gawk: warning: backslash string continuation is not portable
> > FS2=a
> > bash-5.1$ gawk --posix --lint -v FS='\
> > a' 'BEGIN { print "FS2=" FS }'
> > gawk: fatal: POSIX does not allow physical newlines in string values
> >
> > If one patgches cmdline_fs to add ELIDE_BACK_NL, then all 3 examples
> > give the same result, but I have no idea whether that's the desirable
> > outcome. The actual string argument processed contains backslash followed
> > by newline followed by 'a'. Should that get mapped to 'a'?
> >
> > I also don't know if a -F arg containing a newline in posix mode
> > should trigger that same fatal error.
> >
> > Regards,
> > Andy
> >
> > On Thu, Jun 08, 2023 at 08:43:57AM -0600, arnold@skeeve.com wrote:
> > > That is an interesting report. I will (eventually)
> > > investigate; I don't have a lot of free time at the moment.
> > > 
> > > Thanks,
> > > 
> > > Arnold
> > > 
> > > Denys Vlasenko <dvlasenk@redhat.com> wrote:
> > > 
> > > > GNU awk 5.1.1
> > > >
> > > > gawk -F '\
> > > > a' 'BEGIN { print "FS1=" FS }'
> > > >
> > > > gawk -v FS='\
> > > > a' 'BEGIN { print "FS2=" FS }'
> > > >
> > > > echo | gawk '{ print "FS3=" FS }' FS='\
> > > > a'
> > > >
> > > > The first command treats "backslash+newline" as backslash:
> > > >
> > > > FS1=\a
> > > >
> > > > The second and third commands treat the same as empty string:
> > > >
> > > > FS2=a
> > > > FS3=a
> > > >
> > > > I think it would be better if all forms have the same rules.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]