[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gensub() treats trailing backslash in replacement string inconsisten
From: |
arnold |
Subject: |
Re: gensub() treats trailing backslash in replacement string inconsistently / surprisingly |
Date: |
Sun, 04 Jun 2023 01:50:49 -0600 |
User-agent: |
Heirloom mailx 12.5 7/5/10 |
Hi.
Thanks for the bug report. This bug still exists in 5.2.2,
the current version.
The bug is a simple logic error. The fix is below.
Thanks,
Arnold
Denys Vlasenko <dvlasenk@redhat.com> wrote:
> Awk 5.1.1
>
> In gensub(), backslash can be used to escape special char '&',
> and is used to denote \0 - \9 "replace by Nth substring" operations.
>
> It is not specified what would happen if backslash is followed
> by some other char, such as \k. Experimentally, backslash gets
> removed - \k acks the same as k. Good.
>
> It is also not specified what would happen if backslash is the last char.
> And here, it's inconsistent. The replacement string which is
> just one backslash uses that string verbatim:
>
> awk 'BEGIN { s="\\";print "s=" s;g=gensub("a",s,1,"a");print g }'
> s=\
> \
>
> The replacement string which has something non-empty and then ends
> in backslash, at first glance, seems to drop it:
>
> awk 'BEGIN { s="b\\";print "s=" s;g=gensub("a",s,1,"a");print g }'
> s=b\
> b
>
> but in fact, it uses a NUL char (!) there:
>
> awk 'BEGIN { s="b\\";print "s=" s;g=gensub("a",s,1,"a");print g; print
> length(g) }'
> s=b\
> b
> 2 <============== HUH??
>
> awk 'BEGIN { s="b\\";g=gensub("a",s,1,"a");print g }' | hexdump -vC
> 00000000 62 00 0a |b..|
> 00000003 ^^------------- AHA!!!
>
> I think it would be better to do something consistent.
> Insertion of NUL char is particularly odd.
>
------------------------
diff --git a/builtin.c b/builtin.c
index 0e609220..e394cc34 100644
--- a/builtin.c
+++ b/builtin.c
@@ -3218,6 +3218,8 @@ do_sub(int nargs, unsigned int flags)
*bp++ =
*cp;
}
scan++;
+ } else if (scan+1 == replend) {
+ *bp++ = *scan;
} else /* \q for any q --> q */
*bp++ = *++scan;
} else if (do_posix) {