emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: prior work on non-backtracking regex engine?


From: Po Lu
Subject: Re: prior work on non-backtracking regex engine?
Date: Mon, 08 Apr 2024 22:00:27 +0800
User-agent: Gnus/5.13 (Gnus v5.13)

Danny McClanahan <dmcc2@hypnicjerk.ai> writes:

>> P.S. Better regexp engine would be very welcome.
>
> ^_^ !!! This makes me so happy to hear! I took the time to learn how
> regex-emacs.c as well as other emacs C code interacts with the gap
> buffer last week, and was super glad to see how simple the gap buffer
> data structure is. I am not an expert on string search performance
> yet, but I believe the gap buffer (which is always allocated within a
> single block, and has at most two non-adjacent data sections) is
> likely much more amenable to high-performance search techniques than a
> more complex data structure such as a rope. And I was also *super*
> pleased to see that regex-emacs.h itself doesn't expose any dependency
> on the gap buffer or other internal emacs representations (except
> regarding multibyte encoding). So in my amateur evaluation, emacs
> actually seems very well-placed to take advantage of high-performance
> regex engine techniques without any big structural changes.

[...]

> I would *really like* to eventually have emacs depend on an existing
> regex engine like re2 or rust regex to take advantage of their
> bugfixes and optimizations, but both of those engines (and most
> others) require utf-8 input, and (I'm pretty sure) can't easily be
> made to support emacs's multibyte functionality. So I think there is a
> strong case for a new engine here, especially one licensed with the
> GPLv3 (or any later version) as opposed to the LGPL or other more
> permissive license.

I hate to rain on people's parades, but from where I stand, introducing
one more mandatory dependency, not least a dependency for so fundamental
a component of Emacs as the regexp matcher, is not acceptable, even if
this library is maintained under the auspices of the GNU project or
under the GPL, license or stewardship being really immaterial issues.
The options being tendered are still less so: the one is written in C++,
and relies on a library that gives an awful impression of being a
boost-in-the-making, and the other is written in an immature language
that is not portable, especially to older systems we support for users
with antiquarian interests.

Don't let's delude ourselves that a solution is waiting in the wings in
the shape of some mystical library, and embark our hopes on this library
alone, to the detriment of improving the regex engine we already have.
It's a recurring trap that we should have learned to avoid by now.  If
the sabotage of xz, for example, has taught us anything, it's that
software is generally improved by a reduction in dependencies.

Thanks.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]