autoconf-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug#67841] [PATCH] Clarify error messages for misuse of m4_warn and


From: Zack Weinberg
Subject: Re: [bug#67841] [PATCH] Clarify error messages for misuse of m4_warn and --help for -W.
Date: Mon, 18 Dec 2023 13:58:27 -0500
User-agent: Cyrus-JMAP/3.9.0-alpha0-1283-g327e3ec917-fm-20231207.002-g327e3ec9

On Fri, Dec 15, 2023, at 7:08 PM, Jacob Bachmeyer wrote:
> Zack Weinberg wrote:
>> [...]
>> Also, there’s a perl 2.14ism in one place (s///a) which I need
>> to figure out how to make 2.6-compatible before it can land.
...
>> +  $q_channel =~ s/([^\x20-\x7e])/"\\x".sprintf("%02x", ord($1))/aeg;
...
> If I am reading perlre correctly, you should be able to simply drop the 
> /a modifier because it has no effect on the pattern you have written, 
> since you are using an explicit character class and are *not* using the 
> /i modifier.

Thanks, you've made me realize that /a wasn't even what I wanted in the
first place.  What I thought /a would do is force s/// to act byte by
byte -- or, in the terms of perlunitut, force the target string to be
treated as a binary string.  That might be clearer with a concrete example:

$ perl -e '$_ = "\xE2\x88\x85"; s/([^\x20-\x7e])/sprintf("\\x%02x", 
ord($1))/eg; print "$_\n";'
\xe2\x88\x85
$ perl -e '$_ = "\N{EMPTY SET}"; s/([^\x20-\x7e])/sprintf("\\x%02x", 
ord($1))/eg; print "$_\n";'
\x2205

What change do I need to make to the second one-liner to make it also
print \xe2\x88\x85?  How do I express that in a way that is backward
compatible all the way to 5.6.0?  And finally, how do I ensure that
there is absolutely nothing I can put in the initial assignment to
$_ that will cause the rest of the one-liner to crash?  For example
over in the Python universe it's very easy to get Unicode conversion
to crash:

$ python3 -c 'print("\uDC00".encode("utf-8"))'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc00' in position 
0: surrogates not allowed

Given that having non-ASCII here in the first place is pretty unlikely,
I am going to go ahead and land the patch with your suggested changes,
but I'd still appreciate any further advice you have.

zw



reply via email to

[Prev in Thread] Current Thread [Next in Thread]