guix-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug#49946] [PATCH v7 01/32] gnu: tree-sitter: Move to its own module.


From: Andrew Tropin
Subject: [bug#49946] [PATCH v7 01/32] gnu: tree-sitter: Move to its own module.
Date: Tue, 14 Feb 2023 17:24:02 +0400

On 2023-02-12 12:07, Pierre Langlois wrote:

> Hi,
>
> Andrew Tropin <andrew@trop.in> writes:
>
>> [[PGP Signed Part:Undecided]]
>> On 2023-02-10 15:48, Pierre Langlois wrote:
>>
>>> Hi Andrew, thanks for pushing this along! It's great to see things
>>> getting merged.
>>>
>>> Andrew Tropin <andrew@trop.in> writes:
>>>
>>>> [[PGP Signed Part:Undecided]]
>>>> On 2023-02-09 18:04, Andrew Tropin wrote:
>>>>
>>>>> On 2023-02-09 13:39, zimoun wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On Thu, 09 Feb 2023 at 14:11, Andrew Tropin <andrew@trop.in> wrote:
>>>>>>
>>>>>>> I applied tree-sitter and tree-sitter-cli patches,
>>>>>>
>>>>>> Just to be sure to understand, you have only applied 02/32 and 05/32,
>>>>>> right?
>>>>>>
>>>>>>
>>>>>> [bug#49946] [PATCH v7 02/32] gnu: tree-sitter: Update to 0.20.7.
>>>>>> id:20221125012142.22579-3-pierre.langlois@gmx.com
>>>>>> http://issues.guix.gnu.org/msgid/20221125012142.22579-3-pierre.langlois@gmx.com
>>>>>>
>>>>>> [bug#49946] [PATCH v7 05/32] gnu: Add tree-sitter-cli.
>>>>>> id:20221125012142.22579-6-pierre.langlois@gmx.com
>>>>>> http://issues.guix.gnu.org/msgid/20221125012142.22579-6-pierre.langlois@gmx.com
>>>>>>
>>>>>> Leaving out all the others, right?
>>>>>
>>>>> Merged first 5 patches from 01 to 05, also added one more commit, which
>>>>> addresses some things from reviews and one commit, which adds html
>>>>> grammar.
>>>>>
>>>>> The html grammar is added for the testing purposes.  It relies on
>>>>> generated parser.c and scanner.c and we will need to repackage it using
>>>>> grammar.js instead.  I'm not sure if a separate build system is needed
>>>>> for this, I guess we can just rewrite tree-sitter-grammar function,
>>>>> which generates packages as in example with tree-sitter-grammar-html:
>>>>> https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/tree-sitter.scm?h=53b00b91b73bd60412d5bd057e22e6d63194a7f7#n158
>>>>>
>>>>> Anyway, I only skimmed tree-sitter-build-system source code, and plan to
>>>>> read it carefully, evaluate and either introduce new build system or
>>>>> just move all needed parts to tree-sitter-grammar function.  WDYT?
>>>>> After we done with it we can package all other grammars.
>>>>
>>>> Ok, I realized that the proper build process for tree-sitter grammars is
>>>> a little harder than I expected, tree-sitter-build system make sense.  I
>>>> reviewed it, made a small change:
>>>
>>> Ah great, I was going to comment to try and push for us to keep the
>>> build system. I originally went with a template package and inheritance,
>>> but Maxime suggested moving to a build-system which ended up making the
>>> package definitions a *lot* nicer IMO (see previous discussion here
>>> https://issues.guix.gnu.org/49946#144). It also allows us to deal with
>>> grammars that depend on each other more nicely I think.
>>>
>>>>
>>>> @@ -29,7 +29,7 @@ (define-module (guix build tree-sitter-build-system)
>>>>  ;; Commentary:
>>>>  ;;
>>>>  ;; Build procedures for tree-sitter grammar packages.  This is the
>>>> -;; builder-side code, which builds on top fo the node build-system.
>>>> +;; builder-side code, which builds on top of the node build-system.
>>>>  ;;
>>>>  ;; Tree-sitter grammars are written in JavaScript and compiled to a native
>>>>  ;; shared object.  The `tree-sitter generate' command invokes `node' in 
>>>> order
>>>> @@ -114,7 +114,7 @@ (define (compile-language dir)
>>>>                     "-fno-exceptions"
>>>>                     "-O2"
>>>>                     "-g"
>>>> -                   "-o" ,(string-append lib "/" lang ".so")
>>>> +                   "-o" ,(string-append lib "/libtree-sitter-" lang ".so")
>>>>                     ;; An additional `scanner.{c,cc}' file is sometimes
>>>>                     ;; provided.
>>>>                     ,@(cond
>>>>
>>>>
>>>> rewrote html grammar to use this build system and made it work with
>>>> built-in treesit package.  Also, tried examples of c and cpp grammars
>>>> from patches in this thread.
>>>>
>>>> If you ok with it, I'll push the build system to master and update the
>>>> html grammar accordingly.
>>>>
>>>> The final result will look like this:
>>>>
>>>> (define tree-sitter-delete-generated-files
>>>>   #~(begin
>>>>       (delete-file "binding.gyp")
>>>>       (delete-file-recursively "bindings")
>>>>       (delete-file "src/grammar.json")
>>>>       (delete-file "src/node-types.json")
>>>>       (delete-file "src/parser.c")
>>>>       (delete-file-recursively "src/tree_sitter")))
>>>>
>>>> (define* (tree-sitter-grammar
>>>>           language language-for-synopsis version commit hash
>>>>           #:key
>>>>           (repository-url
>>>>            (format #f "https://github.com/tree-sitter/tree-sitter-~a"; 
>>>> language))
>>>>           (inputs '()))
>>>>   (let ((synopsis (string-append language-for-synopsis
>>>>                                  " grammar for tree-sitter"))
>>>>         (name (string-append "tree-sitter-grammar-" language)))
>>>>     (package
>>>>       (name name)
>>>>       (version version)
>>>>       (home-page repository-url)
>>>>       (source (origin
>>>>                 (method git-fetch)
>>>>                 (uri (git-reference
>>>>                       (url repository-url)
>>>>                       (commit commit)))
>>>>                 (file-name (git-file-name name version))
>>>>                 (sha256 (base32 hash))
>>>>                 (modules '((guix build utils)))
>>>>                 (snippet tree-sitter-delete-generated-files)))
>>>>       (build-system tree-sitter-build-system)
>>>>       (inputs inputs)
>>>>       (synopsis synopsis)
>>>>       (description (string-append synopsis "."))
>>>>       (license license:expat))))
>>>>
>>>> (define-public tree-sitter-grammar-html
>>>>   (tree-sitter-grammar
>>>>    "html" "HTML"
>>>>    "0.19.0" "v0.19.0"
>>>>    "1hg7vbcy7bir6b8x11v0a4x0glvqnsqc3i2ixiarbxmycbgl3axy"))
>>>>
>>>> After that we can bring the rest of the grammars.
>>>
>>> I would suggest to rmeove the `tree-sitter-grammar' function, and keep
>>> grammars as "regular" package records, even though it's a little bit
>>> more verbose:
>>>
>>> --8<---------------cut here---------------start------------->8---
>>> (define-public tree-sitter-html
>>>   (package
>>>     (name "tree-sitter-html")
>>
>> It seems tree-sitter-html mimics upstream package name and probably make
>> more sense than tree-sitter-grammar-html used by me.
>
> Yeah, at some point I think I had named the packages with "grammar" as
> well, but thought it was a bit of a mouthful.  I'm also thinking one day
> we may build language bindings as part of the build system (Rust and
> NodeJS I think ATM), so those packages could do more than ship the
> grammar in the future (although we don't know if we'll ever really need
> that).
>
>>
>>>     (version "0.19.0")
>>>     (source (origin
>>>               (method git-fetch)
>>>               (uri (git-reference
>>>                     (url "https://github.com/tree-sitter/tree-sitter-html";)
>>>                     (commit (string-append "v" version))))
>>>               (file-name (git-file-name name version))
>>>               (sha256
>>>                (base32
>>>                 "1hg7vbcy7bir6b8x11v0a4x0glvqnsqc3i2ixiarbxmycbgl3axy"))
>>>               (modules '((guix build utils)))
>>>               (snippet tree-sitter-delete-generated-files)))
>>>     (build-system tree-sitter-build-system)
>>>     (home-page "https://github.com/tree-sitter/tree-sitter-html";)
>>>     (synopsis "Tree-sitter HTML grammar")
>>>     (description
>>>      "This package provides a HTML grammar for the Tree-sitter library.")
>>>     (license license:expat)))
>>> --8<---------------cut here---------------end--------------->8---
>>>
>>> This way, they look like any other package in Guix, which makes it
>>> easier for us to apply automatic changes in the future if needed (for
>>> example like how the input format could be automically updated for all
>>> "simple" package definitions, but had to be manual whenever custom code
>>> refactoring was done). Does this make sense?
>>
>> Make sense, but on the other hand we already have hunspell, aspell
>> dictionaries and probably a few more others, which are very similiar in
>> spirit and we already have to keep in mind their existence on such
>> automatic code updates.
>>
>> It looks that the packages differ only in url for the source code, lang
>> name and sometimes in inputs.  Having template package function can make
>> management of shared parts more centralized, reduce possibility of
>> copy-paste mistakes, when the description wasn't updated and so on and
>> can reduce the amount of the code overall (which also reduces the change
>> of introducing an error).
>>
>> I don't have a strong opinion on this topic, but leaning towards the
>> template function slightly more, however I'm completely ok with the
>> standalone package definitions as well.  WDYT?
>
> I can think of both cost/benefits to the template so I don't have a
> strong opinion either :-).
>
> I do like the template to make sure people don't forget to delete
> generated files, that's quite important as it seems upstream packages
> often check-in the generated C code.  Although, we could probably assert
> that with in the build-system phase?  I'll think about that.
>
> On the other hand, I wonder how the template works for packages that
> provide multiple grammars (see ocaml and typescript for example).  I
> guess we could use the template for trivial packages, and standalone
> definitions for more complex ones?  In general, if we keep the template
> interface really simple, then I'm happy with it.

Hi Pierre!

I spend two days trying grammars with and without helper function and
found hepler quite helpful to reduce boilerplate and errors from
copypaste, so I went the way with helper.  The logic inside is quite
trivial, the only downside I found so far is that in cases when
repository url constructed automatically I can't easily open the repo
url in the browser.

I packaged all the grammars from this thread and a few more on top of
it.  Updated them to usually latest versions, added some comments, when
needed.

If I forgot to reply on something or you have any comments/ideas, let me
know! :)

Kudos to Pierre and everyone, who helped with all the tree-sitter stuff.

-- 
Best regards,
Andrew Tropin

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]