[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gensub() treats trailing backslash in replacement string inconsisten
From: |
Wolfgang Laun |
Subject: |
Re: gensub() treats trailing backslash in replacement string inconsistently / surprisingly |
Date: |
Sat, 3 Jun 2023 19:52:36 +0200 |
There is a general problem with backslashes in the gensub replacement
string.
$ awk 'BEGIN { s="b\\b\\";g=gensub("a",s,1,"a");print g; }' | od -tx1
0000000 62 62 00 0a
GNU Awk 5.1.0, API: 3.0 (GNU MPFR 4.1.0, GNU MP 6.2.1)
Wolfgang
On Sat, 3 Jun 2023 at 19:25, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> Awk 5.1.1
>
> In gensub(), backslash can be used to escape special char '&',
> and is used to denote \0 - \9 "replace by Nth substring" operations.
>
> It is not specified what would happen if backslash is followed
> by some other char, such as \k. Experimentally, backslash gets
> removed - \k acks the same as k. Good.
>
> It is also not specified what would happen if backslash is the last char.
> And here, it's inconsistent. The replacement string which is
> just one backslash uses that string verbatim:
>
> awk 'BEGIN { s="\\";print "s=" s;g=gensub("a",s,1,"a");print g }'
> s=\
> \
>
> The replacement string which has something non-empty and then ends
> in backslash, at first glance, seems to drop it:
>
> awk 'BEGIN { s="b\\";print "s=" s;g=gensub("a",s,1,"a");print g }'
> s=b\
> b
>
> but in fact, it uses a NUL char (!) there:
>
> awk 'BEGIN { s="b\\";print "s=" s;g=gensub("a",s,1,"a");print g; print
> length(g) }'
> s=b\
> b
> 2 <============== HUH??
>
> awk 'BEGIN { s="b\\";g=gensub("a",s,1,"a");print g }' | hexdump -vC
> 00000000 62 00 0a |b..|
> 00000003 ^^------------- AHA!!!
>
> I think it would be better to do something consistent.
> Insertion of NUL char is particularly odd.
>
>
>
--
Wolfgang Laun