[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#64128: regexp parser zero-width assertion bugs
From: |
Mattias Engdegård |
Subject: |
bug#64128: regexp parser zero-width assertion bugs |
Date: |
Sat, 17 Jun 2023 14:20:27 +0200 |
In Emacs regexps, some but not all zero-width assertions have the special
property in that they are not treated as an element for an immediately
following ?, * or +. For example,
\b*
matches a literal asterisk at a word boundary -- the `*` becomes literal
because it is treated as if there were nothing for it to act upon. Even
stranger:
xy\b*
is parsed as, in rx syntax, (* "xy" word-boundary) which is remarkable: the
repetition operator encompasses several elements even though there are no
brackets given. Demo:
(and (string-match "quack,\\b*" "quack,quack,quack,quaaaack!")
(match-data))
=> (0 18)
Zero-width assertions that have the property:
^ (bol), $ (eol), \` (bos), \' (eos), \b (word-boundary), \B (not-word-boundary)
Zero-width assertions that do not have the property (and are treated as any
other element):
\< (bow), \> (eow), \_< (symbol-start), \_> (symbol-end), \= (point)
These regexp patterns should be very rare in practice: they should always be a
mistake, but it would be nice if they behaved in a way that makes some kind of
sense.
A modest improvement would be to make operators become literal after any
zero-width assertion, so that
\<*
becomes (: word-start "*") instead of (* word-start), and
xy\b*
becomes (: "xy" word-boundary "*") instead of (* "xy" word-boundary).
Suggested patch attached.
regexp-zero-width-assertion-bug.diff
Description: Binary data
- bug#64128: regexp parser zero-width assertion bugs,
Mattias Engdegård <=
- bug#64128: regexp parser zero-width assertion bugs, Stefan Monnier, 2023/06/17
- bug#64128: regexp parser zero-width assertion bugs, Mattias Engdegård, 2023/06/17
- bug#64128: regexp parser zero-width assertion bugs, Paul Eggert, 2023/06/17
- bug#64128: regexp parser zero-width assertion bugs, Eli Zaretskii, 2023/06/18
- bug#64128: regexp parser zero-width assertion bugs, Mattias Engdegård, 2023/06/18
- bug#64128: regexp parser zero-width assertion bugs, Stefan Monnier, 2023/06/18
- bug#64128: regexp parser zero-width assertion bugs, Mattias Engdegård, 2023/06/19
- bug#64128: regexp parser zero-width assertion bugs, Stefan Monnier, 2023/06/19
- bug#64128: regexp parser zero-width assertion bugs, Mattias Engdegård, 2023/06/19
- bug#64128: regexp parser zero-width assertion bugs, Paul Eggert, 2023/06/19