help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How are regexen implemented in Emacs?


From: tomas
Subject: Re: How are regexen implemented in Emacs?
Date: Tue, 13 Dec 2022 06:01:22 +0100

On Mon, Dec 12, 2022 at 04:19:19PM -0500, Stefan Monnier wrote:
> > The linked article sounded a bit like "only idiots implement regexen
> > in the naive way", and I was pretty sure Emacs devs are not idiots - but
> > now I understand the reasons better.

There comes my favourite motto: "all generalizations suck" (Michael
Heerdegen enjoyed it a couple of threads back).

Whoever uses a library these days uses PCRE, and this is, AFAIK, a
DFA-with-backtracking thingy. Note that I haven't read the code,
so I might well be wrong.

> I fully agree that it doesn't make sense to *start* with
> a backtracking implementation, yes.
> But once you've invested in one, it's harder to move to
> something better.
> 
> This said, it *would* be better.  Not only in terms of eliminating the
> pathological blow ups, but it also offers opportunity to get new
> functionality, such as the ability to capture the state of a regexp
> match at a specific buffer position (so you can perform a multiline
> regexp match one line at a time).  It could also make it much more
> reasonable to add the possibility to run ELisp code from within the
> regexp match engine (e.g. add a \p(NAME) entry which calls the NAME
> ELisp function).

Perl does the latter. That has saved my bacon from time to time.
So that seems possible with a backtracker, too. Don't ask me how,
though :-)

Saving state at any point would be cool -- you could easily invert
control (feeding the regexp machine a spoonful at a time). Think
network or an abstract buffer.

Cheers
-- 
t

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]