[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
01/13: Use delete-duplicates/sort! in insert-missing-data-and-return-all
From: |
Christopher Baines |
Subject: |
01/13: Use delete-duplicates/sort! in insert-missing-data-and-return-all-ids |
Date: |
Fri, 19 Jan 2024 04:57:46 -0500 (EST) |
cbaines pushed a commit to branch master
in repository data-service.
commit 49b4841c4e1f6745fd5a148cdeb3cf118164b70c
Author: Christopher Baines <mail@cbaines.net>
AuthorDate: Mon Jan 15 11:18:39 2024 +0000
Use delete-duplicates/sort! in insert-missing-data-and-return-all-ids
As it's faster than delete-duplicates for large amounts of data.
---
guix-data-service/model/utils.scm | 48 +++++++++++++++++++++++++++++++++++----
1 file changed, 43 insertions(+), 5 deletions(-)
diff --git a/guix-data-service/model/utils.scm
b/guix-data-service/model/utils.scm
index f40174a..b46e2e4 100644
--- a/guix-data-service/model/utils.scm
+++ b/guix-data-service/model/utils.scm
@@ -178,6 +178,44 @@ WHERE table_name = $1"
(error
(simple-format #f "error: unknown type for value: ~A" v)))))
+ (define (delete-duplicates* data)
+ (delete-duplicates/sort!
+ (list-copy data)
+ (lambda (full-a full-b)
+ (let loop ((a full-a)
+ (b full-b))
+ (if (null? a)
+ #f
+ (let ((a-val (match (car a)
+ ((_ . val) val)
+ ((? symbol? val) (symbol->string val))
+ (val val)))
+ (b-val (match (car b)
+ ((_ . val) val)
+ ((? symbol? val) (symbol->string val))
+ (val val))))
+ (cond
+ ((null? a-val)
+ (if (null? b-val)
+ (loop (cdr a) (cdr b))
+ #t))
+ ((null? b-val)
+ #f)
+ (else
+ (match a-val
+ ((? string? v)
+ (if (string=? a-val b-val)
+ (loop (cdr a) (cdr b))
+ (string<? a-val b-val)))
+ ((? number? v)
+ (if (= a-val b-val)
+ (loop (cdr a) (cdr b))
+ (< a-val b-val)))
+ ((? boolean? v)
+ (if (eq? a-val b-val)
+ (loop (cdr a) (cdr b))
+ a-val)))))))))))
+
(define schema-details
(table-schema conn table-name))
@@ -312,9 +350,9 @@ WHERE table_name = $1"
(string-append "temp_" table-name))
(data
(if sets-of-data?
- (delete-duplicates (concatenate data))
+ (delete-duplicates* (concatenate data))
(if delete-duplicates?
- (delete-duplicates data)
+ (delete-duplicates* data)
data))))
;; Create a temporary table to store the data
(exec-query
@@ -363,7 +401,7 @@ WHERE table_name = $1"
#:vhash result))
vlist-null
(chunk (if sets-of-data?
- (delete-duplicates
+ (delete-duplicates*
(concatenate data))
data)
3000)))))
@@ -375,9 +413,9 @@ WHERE table_name = $1"
(normalise-values field-values)
existing-entries)))
(if sets-of-data?
- (delete-duplicates (concatenate data))
+ (delete-duplicates* (concatenate data))
(if delete-duplicates?
- (delete-duplicates data)
+ (delete-duplicates* data)
data))))
(new-entries
(if (null? missing-entries)
- 04/13: Have delete-duplicates/sort! take a equality procedure, (continued)
- 04/13: Have delete-duplicates/sort! take a equality procedure, Christopher Baines, 2024/01/19
- 07/13: Fix par-map&, Christopher Baines, 2024/01/19
- 09/13: Split and instrument parts of inferior-packages->package-metadata-ids, Christopher Baines, 2024/01/19
- 08/13: Rewrite part of insert-missing-data-and-return-all-ids to avoid filter, Christopher Baines, 2024/01/19
- 11/13: Fixup tests, Christopher Baines, 2024/01/19
- 13/13: Try to fix issues with derivations being GC'ed, Christopher Baines, 2024/01/19
- 10/13: Add meaningful parallelism to processing jobs, Christopher Baines, 2024/01/19
- 03/13: Add back inferior heap size reporting, Christopher Baines, 2024/01/19
- 05/13: Make it possible to destroy a resource pool, Christopher Baines, 2024/01/19
- 02/13: Use delete-duplicates/sort! in inferior-packages->license-set-ids, Christopher Baines, 2024/01/19
- 01/13: Use delete-duplicates/sort! in insert-missing-data-and-return-all-ids,
Christopher Baines <=
- 12/13: Remove drain? #t from process job, Christopher Baines, 2024/01/19
- 06/13: Show backtraces when using parallel fibers and resource pools, Christopher Baines, 2024/01/19