|
From: | Paul Eggert |
Subject: | bug#64128: regexp parser zero-width assertion bugs |
Date: | Mon, 19 Jun 2023 12:21:50 -0700 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 |
On 2023-06-19 11:34, Mattias Engdegård wrote:
Here is a reduced patch that only fixes the really silly behaviour reported earlier, by making sure that `laststart` is reset correctly for all group A assertions. This should be uncontroversial. Maybe we should change group B assertions so that they work in the same way.
- operand. Reset at the beginning of groups and alternatives. */ + operand. Reset at the beginning of groups and alternatives, + and after zero-width assertions which should not be the target + of any postfix repetition operators. */
If I understand things correctly, this would cause "\b*c" to be treated like "\b\*c". If so, it's headed in the wrong direction.
It's long been documented that the only reason "*" is ordinary at the start of a regular expression or subexpression is "historical compatibility", and it's also long been documented that you shouldn't take advantage of this and you should backslash-escape the "*" anyway. In contrast, for constructs like \b* there is not a historical compatibility reason, so there's not a good argument for treating "*" as an ordinary character after "\b".
Instead, \b should not be a special case before "*", and \b* should be equivalent to \(\b\)* and should match only the empty string. Similarly for the other zero-width backslash escapes. This is what I would expect from these constructs from the longstanding documentation.
If we instead added a rule to say that a construct that can only match the empty string causes following "*" to ordinary, then \b* and \(\b\)* would both be equivalent to \*. Although consistent, this would be confusing: it would compound the historical-compatibility mistake. Let's keep things simple instead.
Also, whatever change we make to the behavior should be documented in the manual and in etc/NEWS.
[Prev in Thread] | Current Thread | [Next in Thread] |