bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gensub() treats trailing backslash in replacement string inconsisten


From: Wolfgang Laun
Subject: Re: gensub() treats trailing backslash in replacement string inconsistently / surprisingly
Date: Sat, 3 Jun 2023 19:52:36 +0200

There is a general problem with backslashes in the gensub replacement
string.

$ awk 'BEGIN { s="b\\b\\";g=gensub("a",s,1,"a");print g; }' | od -tx1
0000000 62 62 00 0a

GNU Awk 5.1.0, API: 3.0 (GNU MPFR 4.1.0, GNU MP 6.2.1)

Wolfgang

On Sat, 3 Jun 2023 at 19:25, Denys Vlasenko <dvlasenk@redhat.com> wrote:

> Awk 5.1.1
>
> In gensub(), backslash can be used to escape special char '&',
> and is used to denote \0 - \9 "replace by Nth substring" operations.
>
> It is not specified what would happen if backslash is followed
> by some other char, such as \k. Experimentally, backslash gets
> removed - \k acks the same as k. Good.
>
> It is also not specified what would happen if backslash is the last char.
> And here, it's inconsistent. The replacement string which is
> just one backslash uses that string verbatim:
>
> awk 'BEGIN { s="\\";print "s=" s;g=gensub("a",s,1,"a");print g }'
> s=\
> \
>
> The replacement string which has something non-empty and then ends
> in backslash, at first glance, seems to drop it:
>
> awk 'BEGIN { s="b\\";print "s=" s;g=gensub("a",s,1,"a");print g }'
> s=b\
> b
>
> but in fact, it uses a NUL char (!) there:
>
> awk 'BEGIN { s="b\\";print "s=" s;g=gensub("a",s,1,"a");print g; print
> length(g) }'
> s=b\
> b
> 2  <============== HUH??
>
> awk 'BEGIN { s="b\\";g=gensub("a",s,1,"a");print g }' | hexdump -vC
> 00000000  62 00 0a                                          |b..|
> 00000003     ^^------------- AHA!!!
>
> I think it would be better to do something consistent.
> Insertion of NUL char is particularly odd.
>
>
>

-- 
Wolfgang Laun


reply via email to

[Prev in Thread] Current Thread [Next in Thread]