emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] add compiled regexp primitive lisp object


From: Stefan Monnier
Subject: Re: [PATCH] add compiled regexp primitive lisp object
Date: Tue, 24 Dec 2024 00:37:15 -0500
User-agent: Gnus/5.13 (Gnus v5.13)

> Having compiler regexp object exposed to Elisp would open the
> following extra opportunities:

Nitpick: there are various ways to "Add compiled regexp primitive lisp
object", and not all of them would really expose those objects in ways
would allow the following points.

> 1. They could be inspected from Elisp, and hopefully optimized
>    better. For now, there is simply no way to detect which parts of
>    regexps are slow and which are not.

You can now/already use `re--describe-compiled` to inspect the compiled
regexps used by Emacs's regexp engine.

> 2. They could maybe even be constructed from Elisp, opening
>    opportunities for custom regexp compilers that can be tailored to
>    specific application needs rather than having to stick to hard-coded
>    generic tradeoffs Emacs has to do without knowing the purpose of a
>    regexp.

Definitely.  And since it would be one in ELisp, there could be dozens
of specialized compilers in ELPA packages to choose from.  🙂

Eli wrote:
> If we can optimize them from Lisp, we should be able to do the same
> in C.

`re_compile_pattern` is not terribly easy to improve.  Among other
things because it needs to be fast, single pass, etc...

If regexp objects are exposed in such a way that we can build them from
ELisp and write them to `.elc` files, then anyone can write their own
regexp compiler for their own regexp flavor, which can be as slow as
they like because it won't slow down anyone else.

> If you explain what kind of optimization opportunities you had in
> mind, we could discuss how to implement that.  In any case, adding
> APIs for regexp optimizations doesn't require to have compiled regexp
> objects.

If we want to be able to compile regexps to a DFA-style compiled form,
we'll need to extend the regexp bytecode, of course, but it would either
require we implement that compilation in C (which could be a fair bit
of work) or it would require changing `re_compile_pattern` to recognize
new elements in the regexp-string corresponding to those new
bytecode instructions.

I have the impression it would be simpler to expose the bytecode vectors
to ELisp (so as to be able to build them and print+read them) and do the
rest in ELisp.  Or course, we'd still want to keep `re_compile_pattern`
to handle the cases where we can't (or can't be bothered to) pre-compile
the regexp and hence where the compilation to bytecode needs to be as
fast as possible.


        Stefan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]