bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gensub() treats trailing backslash in replacement string inconsisten


From: arnold
Subject: Re: gensub() treats trailing backslash in replacement string inconsistently / surprisingly
Date: Sun, 04 Jun 2023 01:50:49 -0600
User-agent: Heirloom mailx 12.5 7/5/10

Hi.

Thanks for the bug report.  This bug still exists in 5.2.2,
the current version.

The bug is a simple logic error. The fix is below.

Thanks,

Arnold

Denys Vlasenko <dvlasenk@redhat.com> wrote:

> Awk 5.1.1
>
> In gensub(), backslash can be used to escape special char '&',
> and is used to denote \0 - \9 "replace by Nth substring" operations.
>
> It is not specified what would happen if backslash is followed
> by some other char, such as \k. Experimentally, backslash gets
> removed - \k acks the same as k. Good.
>
> It is also not specified what would happen if backslash is the last char.
> And here, it's inconsistent. The replacement string which is
> just one backslash uses that string verbatim:
>
> awk 'BEGIN { s="\\";print "s=" s;g=gensub("a",s,1,"a");print g }'
> s=\
> \
>
> The replacement string which has something non-empty and then ends
> in backslash, at first glance, seems to drop it:
>
> awk 'BEGIN { s="b\\";print "s=" s;g=gensub("a",s,1,"a");print g }'
> s=b\
> b
>
> but in fact, it uses a NUL char (!) there:
>
> awk 'BEGIN { s="b\\";print "s=" s;g=gensub("a",s,1,"a");print g; print 
> length(g) }'
> s=b\
> b
> 2  <============== HUH??
>
> awk 'BEGIN { s="b\\";g=gensub("a",s,1,"a");print g }' | hexdump -vC
> 00000000  62 00 0a                                          |b..|
> 00000003     ^^------------- AHA!!!
>
> I think it would be better to do something consistent.
> Insertion of NUL char is particularly odd.
>

------------------------
diff --git a/builtin.c b/builtin.c
index 0e609220..e394cc34 100644
--- a/builtin.c
+++ b/builtin.c
@@ -3218,6 +3218,8 @@ do_sub(int nargs, unsigned int flags)
                                                                        *bp++ = 
*cp;
                                                        }
                                                        scan++;
+                                               } else if (scan+1 == replend) {
+                                                       *bp++ = *scan;
                                                } else  /* \q for any q --> q */
                                                        *bp++ = *++scan;
                                        } else if (do_posix) {



reply via email to

[Prev in Thread] Current Thread [Next in Thread]