emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tree-sitter maturity


From: Daniel Colascione
Subject: Re: Tree-sitter maturity
Date: Sun, 29 Dec 2024 15:36:59 -0500
User-agent: mu4e 1.12.8; emacs 31.0.50

Lynn Winebarger <owinebar@gmail.com> writes:

> On Fri, Dec 27, 2024, 9:25 AM Daniel Colascione <dancol@dancol.org> wrote:
>
>>
>>
>> It's a shame there's no way to write TS grammars in plain elisp. I figure
>> vendoring both the source and the generated code would be best, as it'd
>> allow building Emacs anywhere but still make it convenient on systems with
>> needed tools (JS runtime, Rust, etc.) to update and modify the grammar. As
>> with any scheme involving checking in generated outputs, the source and
>> output can get out of sync, but I think there are build time guardrails we
>> can build to make sure it doesn't happen.
>>
>
> I looked into this last year.  The tree-sitter library provides a parsing
> engine that references a fairly standard LR type parsing table in binary
> form.  I got stuck in adding a generic primitive functionality for reading
> and writing arbitrary binary data structures based on a data description
> DSL, since I wouldn't want to tie the interpreter core to the data
> structures of an external, dynamically-loadable library.  But, I wasn't
> sure such an extension would be accepted into emacs, as I am not an expert
> on the possible security implications.
>
> Other than that, emacs already has the code for calculating (LA)LR parsing
> tables in the semantic packages.  The tree-sitter grammar compiler may have
> additional logic for providing multiple starting symbols, but the parsing
> engine should still function with a classic parsing table.

Thanks.  Such an approach would let us treat tree-sitter grammars a lot
more like font-lock-keywords, and I think for some modes, that'd be a
good option.  (Of course, SHTDI.)

Tree sitter, as wonderful as it is, strikes me as a bit of a Rube
Goldberg machine architecturally: JS *and* Rust *and* C? Really? :-)

Do you happen to know whether the subset of Rust that gccrs recognizes
is sufficient to compile the tree sitter grammar compiler?  If so, we
could in principle combine gccrs with a bare-bones embedded JS
interpreter like https://duckjs.org/ to produce a mechanism that would
let us customize and rebuild tree sitter grammars as easily as we do
elisp files, even on obscure platforms like DJGPP.

Some Emacs modes could ship with .js grammars sourced from upstream
editor-neutral projects.  Other modes might just build tree sitter parse
tables in elisp using something vaguely like SMIE syntax.  Both styles
of mode would be customizable by end users, and we'd (because, I'm a
broken record, vendor vendor vendor) we'd maintain compatibility without
mysterious AST-change-related breakages.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]