emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tree-sitter maturity


From: Philip Kaludercic
Subject: Re: Tree-sitter maturity
Date: Fri, 27 Dec 2024 14:57:17 +0000

Daniel Colascione <dancol@dancol.org> writes:

> On December 27, 2024 9:19:12 AM EST, Philip Kaludercic <philipk@posteo.net> 
> wrote:
>>Daniel Colascione <dancol@dancol.org> writes:
>>
>>> On December 27, 2024 7:40:19 AM EST, Eli Zaretskii <eliz@gnu.org> wrote:
>>>>> From: Philip Kaludercic <philipk@posteo.net>
>>>>> Cc: Xiyue Deng <manphiz@gmail.com>,  emacs-devel@gnu.org
>>>>> Date: Fri, 27 Dec 2024 10:54:29 +0000
>>>>> 
>>>>> Richard Stallman <rms@gnu.org> writes:
>>>>> 
>>>>> > If we add something like this to Emacs, there is an issue we need to
>>>>> > take care about: to make carefully sure that it does not install
>>>>> > any nonfree grammars.  I don't know how those grammars are released,
>>>>> > ir by whom, or how much they care about free software.  We can't
>>>>> > take for granted that they do.
>>>>> >
>>>>> > Perhaps we could check automatically that the grammar found is properly
>>>>> > licenses, and disregard any grammars that are not free.
>>>>> >
>>>>> > By contrast, if grammars are going to be packaged and released for
>>>>> > distros, and chosen for installation by users, then it is the user's
>>>>> > responsibility, not Emacs's responsibility, to reject the nonfree ones
>>>>> > (and the GNU/Linux distro might insist on that).
>>>>> 
>>>>> It might take a while for that to happen, which is why I still believe
>>>>> it would be better if tree-sitter major modes would populate
>>>>> `treesit-language-source-alist' on their own, and point to the specific
>>>>> checkouts that the major mode developer tested their implementation
>>>>> against.
>>>>
>>>>We could have done that, but there's no way we could keep the value of
>>>>treesit-language-source-alist up-to-date, because the grammar
>>>>libraries put out new versions much more frequently than Emacs
>>>>releases, especially if you consider libraries that have no official
>>>>versions at all (in which case we can only point to some revision in
>>>>their repository).
>>>>
>>>>The question that bothers me is how useful is it to have
>>>>treesit-language-source-alist that is outdated?  What do we expect the
>>>>users to do with such an outdated value?
>>>>
>>>
>>> Why not just vendor all the grammars with the Emacs modes that use them?
>>
>>I am guessing part of the reason is that TS grammars are not fun to
>>build.  IIRC they are specified in a Javascript DSL (that used to
>>require node.js but AFAIU works with other implementations as well),
>>that a program written in Rust translates to C code.  So do we vendor
>>the DSL and depend on the TreeSitter toolchain or do we vender the
>>generated code?
>
> It's a shame there's no way to write TS grammars in plain elisp. I
> figure vendoring both the source and the generated code would be best,
> as it'd allow building Emacs anywhere but still make it convenient on
> systems with needed tools (JS runtime, Rust, etc.) to update and
> modify the grammar. As with any scheme involving checking in generated
> outputs, the source and output can get out of sync, but I think there
> are build time guardrails we can build to make sure it doesn't happen.

Writing the grammar in Elisp would require both a new toolchain and the
effort of rewriting all the existing grammars in Elisp.  My
understanding of the benefit that TS intends to provide, is that the
manpower invested into writing grammars that deal with all the
edge-cases which traditional regexp/heuristic parsing had difficulties
with.

There is also the general point of helping to realise software freedom,
where a -ts-mode makes it much more difficult (though of course not
impossible) to adjust a grammar.  Wasn't there some complication when
trying to reload a grammar?  The additional dependencies and the
indirect effect of changes compared with Elisp is something we should be
concerned about when trying to maintain "the spirit of Emacs" (which of
course means different things to different people).

Vendoring might help to reproduce builds if that turns out to be a big
issue, but I am not a fan of the additional hurdles in making use of the
source code.  Does anyone know of alternative, less invested
build-chain the re-uses the libtree-sitter.so library.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]