[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug#50359] [PATCH 3/3] import: Add 'generic-git' updater.
From: |
Xinglu Chen |
Subject: |
[bug#50359] [PATCH 3/3] import: Add 'generic-git' updater. |
Date: |
Thu, 16 Sep 2021 14:48:23 +0200 |
On Thu, Sep 16 2021, Sarah Morgensen wrote:
> Hi,
>
> Xinglu Chen <public@yoctocell.xyz> writes:
>
>> On Wed, Sep 15 2021, iskarian@mgsn.dev wrote:
>>
>>> Hi,
>>>
>>> September 10, 2021 9:21 AM, "Xinglu Chen" <public@yoctocell.xyz> wrote:
>>>
>>>> * guix/git.scm (ls-remote-refs): New procedure.
>>>> * tests/git.scm ("remote-refs" "remote-refs: only tags"): New tests.
>>>> * guix/import/git.scm: New file.
>>>> * doc/guix.texi (Invoking guix refresh): Document it.
>>>> * tests/import-git.scm: New test file.
>>>> * Makefile.am (MODULES, SCM_TESTS): Register the new files.
>>>>
>>>> Co-authored-by: Sarah Morgensen <iskarian@mgsn.dev>
>>>
>>> Overall this is looking good. Thank you for adding tests (for
>>> remote-refs as well!), much appreciated. It looks like you've done
>>> some good polishing. I see a few nits, which I'll point out in a
>>> separate email when I'm not on mobile. I'll also give it a good test.
>>>
>>> But... I've been thinking about the overall approach for a couple
>>> days, because I'm not very happy with the complexity of my heuristic.
>>>
>>> There can be a lot of weird tags in a repository--look at the one for
>>> xf86-video-intel for example. My heuristic attempts to capture the
>>> assumption that repostories tend to move from using "_" or "-" to "."
>>> but it does fail to account for moving to or from dates (because dates
>>> don't compare with normal versions).
>>
>> But if a repo moved from using versions to tags, or vice-versa, we still
>> wouldn’t know if say “3.0.1” is newer than “2021.03.02”. We would have
>> to know when the “3.0.1” tag was created.
>
> You're right; I thought of that afterwards.
>
>> Maybe we could have a ‘release-tag-date-scheme?’ property, that way we
>> could just try to match dates?
>
> That seems like it might be the only way to handle it in some cases (if
> they have both versions and dates with a "." delimiter).
It doesn’t have to be “.” delimiter though; if they both have the same
delimiter it would be difficult to distinguish a version from a date,
e.g., “1-2-3” vs “2021-03-23”.
> (Though, we are actually interested in the *lack* of a date scheme.
> If they use a date scheme now, other versions will be disregarded, so
> we're fine; but if they use versions now and used a date scheme
> before, the versions will be discarded.)
I am not sure what you are trying to say, could you elaborate?
>>> I also realized that we are not using a very useful piece of
>>> information--the previous version/tag combo. I expect that in the
>>> vast majority of cases, the version delimiter for the newest version
>>> will be the same as the version delimiter for the last known version.
>>> (Perhaps the prefix as well?) Can we use this information to make our
>>> guesses better? What do you think?
>>
>> That sounds like a good idea. What should happen if the delimiter from
>> the previous version/tag combo is different from the one that the
>> ‘guess-delimiter’ procedure returns? Should the one from the previous
>> version/tag combo take precedence.
>
> Consider:
>
> prefix := 'tag-prefix or guess-prefix-from-current-version+tag or default
> delim := 'tag-delim or guess-delim-from-current-version+tag or
> guess-delimiter
> suffix := 'tag-suffix or default
>
> This should cover:
>
> 1. Format stayed the same (including date formats)
> 2. Format changed from (git-version ...) to proper version
>
> This does not otherwise cover a complete change in format, such as "_"
> -> ".", date(-) -> version, or version -> date(.), for which I could
> argue requiring a manual update is reasonable.
Yeah, it’s not really possible to automatically detect those kind of
changes.
> It also does not cover when the tags have both versions and dates with
> the same delimiter.
>
> Though it would be nice to see when such updates are available, is it
> worth some bogus results? Are false positives better or false negatives
> better?
Hmm, good question! If in the future we have some kind of bot that
automatically runs ‘guix refresh -u’, builds the updated package, and
send a patch to the mailing list, not having false positives might be
more important. We could also have a ‘disable-tag-updater?’ property to
disable the updater for packages which gives false positive, or maybe
that will result in to many properties.
> Unless you/we want to pursue one or both of the above changes now, the
> latest patch LGTM (modulo my nits).
I would prefer to wait a bit with the improvements mentioned above. The
current patch has been in the works a week or two already, so it’s
probably a good idea to get it merged, and try to solve the less
important issues later. :-)
> Thanks for your work,
You are welcome! And thanks for taking the time to test and review the
work.
On Thu, Sep 16 2021, Sarah Morgensen wrote (again):
> Hello,
>
> Here are my promised nits.
>
> Xinglu Chen <public@yoctocell.xyz> writes:
>
>> * guix/git.scm (ls-remote-refs): New procedure.
>> * tests/git.scm ("remote-refs" "remote-refs: only tags"): New tests.
>> * guix/import/git.scm: New file.
>> * doc/guix.texi (Invoking guix refresh): Document it.
>> * tests/import-git.scm: New test file.
>> * Makefile.am (MODULES, SCM_TESTS): Register the new files.
>>
>> Co-authored-by: Sarah Morgensen <iskarian@mgsn.dev>
>
> Again, much thanks for writing tests.
>
>> +@item generic-git
>> +a generic updater for packages hosted on Git repositories. It tries to
>> +be smart about parsing Git tag names, but if it is not able to parse the
>> +tag name and compare tags correctly, users can define the following
>> +properties for a package.
>> +
>> +@itemize
>> +@item @code{release-tag-prefix}: a regular expression for matching a prefix
>> of
>> +the tag name.
>> +
>> +@item @code{release-tag-suffix}: a regular expression for matching a suffix
>> of
>> +the tag name.
>> +
>> +@item @code{release-tag-version-delimiter}: a string used as the delimiter
>> in
>> +the tag name for separating the numbers of the version.
>> +@end itemize
>> +
>> +@lisp
>> +(package
>> + (name "foo")
>> + ;; ...
>> + (properties
>> + '((release-tag-prefix . "^release0-")
>> + (release-tag-suffix . "[a-z]?$")
>> + (release-tag-version-delimiter . ":"))))
>> +@end lisp
>> +
>> +By default, the updater will ignore pre-releases; to make it also look
>> +for pre-releases, set the @code{accept-pre-releases?} property to
>> +@code{#t}.
>
> Should this be itemized above?
That’s probably a good idea, since it is related to how the tag will be
parsed.
>> +
>> +;;
>> +;;; Remote operations.
>> +;;;
>> +
>> +(define* (remote-refs url #:key tags?)
>> + "Return the list of references advertised at Git repository URL. If TAGS?
>> +is true, limit to only refs/tags."
>> + (define (ref? ref)
>> + ;; Like `git ls-remote --refs', only show actual references.
>> + (and (string-prefix? "refs/" ref)
>> + (not (string-suffix? "^{}" ref))))
>> +
>> + (define (tag? ref)
>> + (string-prefix? "refs/tags/" ref))
>> +
>> + (define (include? ref)
>> + (and (ref? ref)
>> + (or (not tags?) (tag? ref))))
>> +
>> + (with-libgit2
>> + (call-with-temporary-directory
>> + (lambda (cache-directory)
>> + (let* ((repository (repository-init cache-directory))
>> + ;; Create an in-memory remote so we don't touch disk.
>> + (remote (remote-create-anonymous repository url)))
>> + (remote-connect remote)
>> + (remote-disconnect remote)
>> + (repository-close! repository)
>> +
>> + (filter-map (lambda (remote)
>> + (let ((name (remote-head-name remote)))
>> + (and (include? name)
>> + name)))
>> + (remote-ls remote)))))))
>
> I discovered that this can segfault unless 'remote-disconnect' and
> possibly 'repository-close!' are called *after* copying the data out.
> I've attached a diff for this.
I don’t see a diff attached; maybe you forgot? :-)
>> +
>> +;;; Updater
>> +
>> +(define %pre-release-words
>> + '("alpha" "beta" "rc" "dev" "test"))
>
> I found a few packages that use "pre" as well.
Good catch, I noticed that as well when doing some more testing.
>> +
>> +(define %pre-release-rx
>> + (map (cut make-regexp <> regexp/icase) %pre-release-words))
>> +
>> +(define* (version-mapping tags #:key prefix suffix delim pre-releases?)
>> + "Given a list of Git TAGS, return a association list where the car is the
> ^ an
>
>> +version corresponding to the tag, and the cdr is the name of the tag."
>> + (define (guess-delimiter)
>> + (let ((total (length tags))
>> + (dots (reduce + 0 (map (cut string-count <> #\.) tags)))
>> + (dashes (reduce + 0 (map (cut string-count <> #\-) tags)))
>> + (underscores (reduce + 0 (map (cut string-count <> #\_) tags))))
>> + (cond
>> + ((>= dots (* total 0.35)) ".")
>> + ((>= dashes (* total 0.8)) "-")
>> + ((>= underscores (* total 0.8)) "_")
>> + (else ""))))
>> +
>> + (define delim-rx (regexp-quote (or delim (guess-delimiter))))
>> + (define suffix-rx (string-append (or suffix "") "$"))
>> +
>> +
>> + (define prefix-rx (string-append "^" (or prefix "[^[:digit:]]*")))
>> + (define pre-release-rx
>> + (if pre-releases?
>> + (string-append ".*(" (string-join %pre-release-words "|") ").*")
>> + ""))
>> +
>> + (define tag-rx
>> + (string-append prefix-rx "([[:digit:]][^" delim-rx "[:punct:]]*"
>> + "(" delim-rx "[^[:punct:]" delim-rx "]+)"
>> + ;; If there is are no delimiters, it could mean that the
> ^ no "is"
>
>> + ;; version just contains one number (e.g., "2"), thus,
>> use
>> + ;; "*" instead of "+" to match zero or more numbers.
>> + (if (string=? delim-rx "") "*" "+")
>
> Good catch.
>
>> + pre-release-rx ")" suffix-rx))
>> +
>> + (define (get-version tag)
>> + (let ((tag-match (regexp-exec (make-regexp tag-rx) tag)))
>> + (and tag-match
>> + (regexp-substitute/global
>> + #f delim-rx (match:substring tag-match 1)
>> + ;; Don't insert "." if there aren't any delimiters in the first
>
> Nit: "if there were no delimiters", to be consistent with above comment.
Noted.
>> + ;; place.
>> + 'pre (if (string=? delim-rx "") "" ".") 'post))))
>
> One issue with returning a different delimiter than the package
> currently uses is that the automatic updater won't really work as-is.
Good point;, the tag name would be incorrect in those cases.
> Hmmm. When things are modified so the updater gets both the version and
> the git-reference, it should be able to reverse-engineer things well
> enough there.
Ah, looks like (guix upstream) needs some work. :-)
> I imagine this is really only going to be an issue with dates currently
> written as "2017-01-01", anyway. I'll put my comments on that in reply
> to the other email.
>
>> +
>> + (define (entry<? a b)
>> + (eq? (version-compare (car a) (car b)) '<))
>> +
>> + (stable-sort (filter-map (lambda (tag)
>> + (let ((version (get-version tag)))
>> + (and version (cons version tag))))
>> + tags)
>> + entry<?))
>> +
>> +(define* (latest-tag url #:key prefix suffix delim pre-releases?)
>> + "Return the latest tag available from the Git repository at URL."
>
> This returns two values (in preparation for the above-mentioned switch),
> so maybe something like "Return the latest version and corresponding tag
> available from..."
Good catch.
>> + (define (pre-release? tag)
>> + (any (cut regexp-exec <> tag)
>> + %pre-release-rx))
>> +
>> + (let* ((tags (map (cut string-drop <> (string-length "refs/tags/"))
>
> Should be "cute" so string-length is only evaluated once -- though it's
> probably optimized like that anyway.
Good catch, I keep forgetting that ‘cute’ exists. :-)
>> + (remote-refs url #:tags? #t)))
>> + (versions->tags
>> + (version-mapping (if pre-releases?
>> + tags
>> + (filter (negate pre-release?) tags))
>> + #:prefix prefix
>> + #:suffix suffix
>> + #:delim delim
>> + #:pre-releases? pre-releases?)))
>> + (cond
>> + ((null? tags)
>> + (git-no-tags-error))
>> + ((null? versions->tags)
>> + (git-no-valid-tags-error))
>> + (else
>> + (match (last versions->tags)
>> + ((version . tag)
>> + (values version tag)))))))
>> +
>> +(define (latest-git-tag-version package)
>> + "Given a PACKAGE, return the latest version of it, or #f if the latest
>> version
>> +could not be determined."
>> + (guard (c ((or (git-no-tags-error? c) (git-no-valid-tags-error? c))
>> + (warning (or (package-field-location package 'source)
>> + (package-location package))
>> + (G_ "~a for ~a~%")
>> + (condition-message c)
>> + (package-name package))
>> + #f)
>> + ((eq? (exception-kind c) 'git-error)
>> + (warning (or (package-field-location package 'source)
>> + (package-location package))
>> + (G_ "failed to fetch Git repository for ~a~%")
>> + (package-name package))
>> + #f))
>> + (let* ((source (package-source package))
>> + (url (git-reference-url (origin-uri source)))
>> + (properties (package-properties package))
>> + (tag-prefix (assq-ref properties 'release-tag-prefix))
>> + (tag-suffix (assq-ref properties 'release-tag-suffix))
>> + (tag-version-delimiter (assq-ref properties
>> 'release-tag-version-delimiter))
>> + (refresh-pre-releases? (assq-ref properties
>> 'accept-pre-releases?)))
>> + (latest-tag url
>> + #:prefix tag-prefix
>> + #:suffix tag-suffix
>> + #:delim tag-version-delimiter
>> + #:pre-releases? refresh-pre-releases?))))
>
> This is entirely a style preference, so only take this suggestion if you
> like it :)
>
> (let* ((source (package-source package))
> (url (git-reference-url (origin-uri source)))
> (property (cute assq-ref (package-properties package) <>)))
> (latest-tag url
> #:prefix (property 'release-tag-prefix)
> #:suffix (property 'release-tag-suffix)
> #:delim (property 'release-tag-version-delimiter)
> #:pre-releases? (property 'accept-pre-releases?)))))
That does look cleaner, thanks for the suggestion!
>> +
>> +(define (git-package? package)
>> + "Whether the origin of PACKAGE is a Git repostiory."
>
> "Return true if PACKAGE is..."
“PACKAGE is a Git repository.” doesn’t really sound right, maybe “if
PACKAGE is hosted on a Git repository”?
>> + (match (package-source package)
>> + ((? origin? origin)
>> + (and (eq? (origin-method origin) git-fetch)
>> + (git-reference? (origin-uri origin))))
>> + (_ #f)))
>> +
>> +(define (latest-git-release package)
>> + "Return the latest release of PACKAGE."
>
> "Return an <upstream-source> for the latest...", to match the other
> updaters.
>
>> + (let* ((name (package-name package))
>> + (old-version (package-version package))
>> + (url (git-reference-url (origin-uri (package-source package))))
>> + (new-version (latest-git-tag-version package)))
>> +
>> + (and new-version
>> + (upstream-source
>> + (package name)
>> + (version new-version)
>> + (urls (list url))))))
>> +
>> +(define %generic-git-updater
>> + (upstream-updater
>> + (name 'generic-git)
>> + (description "Updater for packages hosted on Git repositories")
>> + (pred git-package?)
>> + (latest latest-git-release)))
>
> I tested this updater on all packages in .scm files starting with f
> through z, and I found the following packages with possibly bogus
> updates:
>
> --8<---------------cut here---------------start------------->8---
> javaxom
I assume you meant ‘java-xom’ :-)
That’s a weird scheme; setting the delimiter to “.” doesn’t help since
it thinks that “127” is greater than “1.3.7”.
> luakit
> ocproxy
> pitivi
‘pitivi’ has a pretty weird version string to begin with; it may be
better to change it to the date: “0.999.0-2021-05.0” -> “2021-05.0”.
> eid-mw
> libhomfly
> gnuradio
> welle-io
Setting the delimiter to "." fixes the issue.
> racket-minimal
Setting the prefix to "v" fixes this.
> milkytracker
> cl-portal
> kodi-cli
> openjdk
> java-bouncycastle
> hurd
> opencsg
Setting the suffix to "-release" fixes this.
> povray
> gpsbabel
Setting the prefix to "gpsbabel_" fixes this.
> go
> stepmania
> ocaml-mcl
>
> many minetest packages (minetest will have its own updater, though)
>
> ocaml4.07-core-kernel, ocamlbuild and many other ocaml packages
> (they seem to be covered by the github updater)
> --8<---------------cut here---------------end--------------->8---
Hmm, ‘guix refresh’ says that ‘ocamlbuild’ is already the latest
version. But you are right, many of the packages are already taken care
of by the ‘github’ updater.
> The following packages suggest a version -> date update, which may or
> may not be bogus:
>
> --8<---------------cut here---------------start------------->8---
> cataclysm-dda
> autotrace
> lbalgtk
> nheko
> libqalculate
> cl-antik
> cl-antik-base
> cl-hu.dwim.stefil
> cl-stefil
> cl-gsll
> sbcl-cl-gserver
> --8<---------------cut here---------------end--------------->8---
Thanks for taking the time to find these false positive!
signature.asc
Description: PGP signature
- [bug#50359] [PATCH] import: Add 'generic-git' updater., (continued)
[bug#50359] [PATCH] import: Add 'generic-git' updater., Xinglu Chen, 2021/09/05
[bug#50359] [PATCH 0/3] Add 'generic-git' updater., Xinglu Chen, 2021/09/10
[bug#50359] [PATCH 1/3] tests: git: Don't read from the users global Git config file., Xinglu Chen, 2021/09/10
[bug#50359] [PATCH v3 0/3] Add 'generic-git' updater, Xinglu Chen, 2021/09/17
[bug#50359] [PATCH 3/3] import: Add 'generic-git' updater., iskarian, 2021/09/15