[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug#39258] [PATCH 4/4] gnu: Use xapian index for package search.
From: |
zimoun |
Subject: |
[bug#39258] [PATCH 4/4] gnu: Use xapian index for package search. |
Date: |
Tue, 3 Mar 2020 20:21:46 +0100 |
Hi Arun,
On Thu, 27 Feb 2020 at 21:42, Arun Isaac <address@hidden> wrote:
>
> * gnu/packages.scm (search-package-index): New function.
> * guix/scripts/package.scm (find-packages-by-description): Search using the
> xapian package index if search patterns are literal strings. Else, search
> using fold-packages.
> ---
> gnu/packages.scm | 17 +++++++++++-
> guix/scripts/package.scm | 57 +++++++++++++++++++++++-----------------
> 2 files changed, 49 insertions(+), 25 deletions(-)
>
> diff --git a/gnu/packages.scm b/gnu/packages.scm
> index e91753e2a8..5b5b29bf84 100644
> --- a/gnu/packages.scm
> +++ b/gnu/packages.scm
> @@ -67,7 +67,8 @@
> specifications->manifest
>
> generate-package-cache
> - generate-package-search-index))
> + generate-package-search-index
> + search-package-index))
>
> ;;; Commentary:
> ;;;
> @@ -453,6 +454,20 @@ reducing the memory footprint."
>
> db-path)
>
> +(define (search-package-index profile querystring)
> + (let ((offset 0)
> + (pagesize 10))
Why this value of 10?
This fix the number of packages returned. Hum?
I have tried to replace by 100 and I got 100 packages. :-)
> + (call-with-database (string-append profile %package-search-index)
> + (lambda (db)
> + (let ((query (parse-query querystring #:stemmer (make-stem "en"))))
> + (mset-fold (lambda (item result)
I do not know what is the convention for the bindings.
But there is 'fold-packages' so I would be inclined to 'fold-msets' or
something in this flavour.
> + (match (find-packages-by-name
> + (document-data (mset-item-document item)))
> + ((package _ ...)
> + (append result `((,package . ,(mset-item-weight
> item)))))))
> + '()
> + (enquire-mset (enquire db query) offset pagesize)))))))
> +
>
> (define %sigint-prompt
> ;; The prompt to jump to upon SIGINT.
> diff --git a/guix/scripts/package.scm b/guix/scripts/package.scm
> index 1cb0d382bf..6a3b9002dd 100644
> --- a/guix/scripts/package.scm
> +++ b/guix/scripts/package.scm
> @@ -7,6 +7,7 @@
> ;;; Copyright © 2016 Benz Schenk <address@hidden>
> ;;; Copyright © 2016 Chris Marusich <address@hidden>
> ;;; Copyright © 2019 Tobias Geerinckx-Rice <address@hidden>
> +;;; Copyright © 2020 Arun Isaac <address@hidden>
> ;;;
> ;;; This file is part of GNU Guix.
> ;;;
> @@ -178,31 +179,40 @@ hooks\" run when building the profile."
> ;;; Package specifications.
> ;;;
>
> -(define (find-packages-by-description regexps)
> +(define (find-packages-by-description patterns)
> "Return a list of pairs: packages whose name, synopsis, description,
> or output matches at least one of REGEXPS sorted by relevance, and its
> non-zero relevance score."
> - (let ((matches (fold-packages (lambda (package result)
> - (if (package-superseded package)
> - result
> - (match (package-relevance package
> - regexps)
> - ((? zero?)
> - result)
> - (score
> - (cons (cons package score)
> - result)))))
> - '())))
> - (sort matches
> - (lambda (m1 m2)
> - (match m1
> - ((package1 . score1)
> - (match m2
> - ((package2 . score2)
> - (if (= score1 score2)
> - (string>? (package-full-name package1)
> - (package-full-name package2))
> - (> score1 score2))))))))))
> + (define (regexp? str)
> + (string-any
> + (char-set #\. #\[ #\{ #\} #\( #\) #\\ #\* #\+ #\? #\| #\^ #\$)
> + str))
Instead of reverting this, I would let the current
'find-packages-by-description' and would add
'find-packages-by-description-indexed' doing just
'(search-package-index (current-profile) (string-join patterns " "))'.
And maybe refactoring the sort of scores. Then I would put the test
branch in 'guix/scripts/packages.scm'...
> + (if (and (current-profile)
> + (not (any regexp? patterns)))
> + (search-package-index (current-profile) (string-join patterns " "))
> + (let* ((regexps (map (cut make-regexp* <> regexp/icase) patterns))
> + (matches (fold-packages (lambda (package result)
> + (if (package-superseded package)
> + result
> + (match (package-relevance package
Note that I am in the process of implementing the BM25 weights as
'package-relevance'; at least really thinking about it! :-)
I have already talked about TF-IDF as relevance, for example here [1].
And reading the Xapian documentation [2], it seems affordable. Or not
;-) because of the regexp... Need some thoughts... I mean "in the
process". ;-)
And in this case, it is almost a drop-in replacement of
'fold-packages' by 'mset-fold'; well it should add some flexibility
and a more unified code.
(Aside the searching, IMHO 'package-relevance' should help too in the
linting process of bad written descriptions, another story. ;-)
[1] https://lists.gnu.org/archive/html/guix-devel/2019-07/msg00252.html
[2] https://xapian.org/docs/bm25.html
> + regexps)
> + ((? zero?)
> + result)
> + (score
> + (cons (cons package score)
> + result)))))
> + '())))
> + (sort matches
> + (lambda (m1 m2)
> + (match m1
> + ((package1 . score1)
> + (match m2
> + ((package2 . score2)
> + (if (= score1 score2)
> + (string>? (package-full-name package1)
> + (package-full-name package2))
> + (> score1 score2)))))))))))
>
> (define (transaction-upgrade-entry store entry transaction)
> "Return a variant of TRANSACTION that accounts for the upgrade of ENTRY, a
> @@ -777,8 +787,7 @@ processed, #f otherwise."
...here.
+ (define (regexp? str)
+ (string-any
+ (char-set #\. #\[ #\{ #\} #\( #\) #\\ #\* #\+ #\? #\| #\^ #\$)
+ str))
> (('query 'search rx) rx)
> (_ #f))
> opts))
>
> - (regexps (map (cut make-regexp* <> regexp/icase) patterns))
> - (matches (find-packages-by-description regexps)))
+ (if (any regexp? patterns)
+ (matches (find-packages-by-description regexps))
+ (matches (find-packages-by-description-indexed patterns))
I mean something like that.
> (leave-on-EPIPE
> (display-search-results matches (current-output-port)))
> #t))
> --
> 2.23.0
All the best,
simon
- [bug#39258] [PATCH 4/4] gnu: Use xapian index for package search.,
zimoun <=