bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#64128: regexp parser zero-width assertion bugs


From: Paul Eggert
Subject: bug#64128: regexp parser zero-width assertion bugs
Date: Mon, 19 Jun 2023 12:21:50 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0

On 2023-06-19 11:34, Mattias Engdegård wrote:
Here is a reduced patch that only fixes the really silly behaviour reported 
earlier, by making sure that `laststart` is reset correctly for all group A 
assertions. This should be uncontroversial.
Maybe we should change group B assertions so that they work in the same way.

-     operand.  Reset at the beginning of groups and alternatives.  */
+     operand.  Reset at the beginning of groups and alternatives,
+     and after zero-width assertions which should not be the target
+     of any postfix repetition operators.  */

If I understand things correctly, this would cause "\b*c" to be treated like "\b\*c". If so, it's headed in the wrong direction.

It's long been documented that the only reason "*" is ordinary at the start of a regular expression or subexpression is "historical compatibility", and it's also long been documented that you shouldn't take advantage of this and you should backslash-escape the "*" anyway. In contrast, for constructs like \b* there is not a historical compatibility reason, so there's not a good argument for treating "*" as an ordinary character after "\b".

Instead, \b should not be a special case before "*", and \b* should be equivalent to \(\b\)* and should match only the empty string. Similarly for the other zero-width backslash escapes. This is what I would expect from these constructs from the longstanding documentation.

If we instead added a rule to say that a construct that can only match the empty string causes following "*" to ordinary, then \b* and \(\b\)* would both be equivalent to \*. Although consistent, this would be confusing: it would compound the historical-compatibility mistake. Let's keep things simple instead.

Also, whatever change we make to the behavior should be documented in the manual and in etc/NEWS.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]