bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#64128: regexp parser zero-width assertion bugs


From: Mattias Engdegård
Subject: bug#64128: regexp parser zero-width assertion bugs
Date: Sun, 18 Jun 2023 22:26:28 +0200

18 juni 2023 kl. 06.55 skrev Eli Zaretskii <eliz@gnu.org>:

> My comment is that since this was a documented feature, I'm not
> interested in making it an error.

Yes, it would be unwise to raise an error for "^*" or the like; it's in active 
use.
The manual is a bit hazy about what we actually promise, though.

As Paul notes, we must be able to document it and that might not be easy, so 
perhaps we shouldn't even try (to change, or document)?

To make everything clear, we have to groups of zero-width assertions:

Group A: ^ $ \` \' \b \B
Group B: \< \> \_< \_> \=

Group B assertions work like ordinary elements, syntactically and semantically. 
Simple, predictable, but also useless.

Group A assertions are more interesting: either there is nothing before a train 
of such assertions, such as

   "^\\`\\b\\`*?"

which turns the first character of the operator into a literal (and a second 
character, if present, now becomes an operator acting on that literal).
Or there is something, and the operator acts on the last element preceding the 
assertions, except that multiple literal characters coalesce to a single 
element. Except if one of the literal chars is an out-of-place `^` which splits 
a sequence of literals into separate segments but not exactly where you think 
it would.
For example,

  "abc^def\\B\\B+?"

means, I think,

  (seq "ab" (+? "c^def" not-word-boundary not-word-boundary))








reply via email to

[Prev in Thread] Current Thread [Next in Thread]