guix-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[no subject]


From: Ludovic Courtès
Date: Mon, 8 Jul 2024 11:24:30 -0400 (EDT)

branch: main
commit a6df98c2589200242c8cb658d6b0c0b1950c8051
Author: Ludovic Courtès <ludo@gnu.org>
AuthorDate: Mon Jul 8 10:34:37 2024 +0200

    database: ‘db-remove-unresponsive-workers’ reschedules stale builds.
    
    Previously, builds might remain in ‘submitted’ state indefinitely in
    case workers that were assigned a build would not respond.
    
    * src/cuirass/database.scm (%build-submission-timeout): New variable.
    (db-remove-unresponsive-workers): Reschedule builds that have been in
    ‘submitted’ state for longer than ‘%build-submission-timeout’.
---
 src/cuirass/database.scm | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/src/cuirass/database.scm b/src/cuirass/database.scm
index 32285d8..2780e7a 100644
--- a/src/cuirass/database.scm
+++ b/src/cuirass/database.scm
@@ -2227,6 +2227,11 @@ Builds.starttime DESC, Builds.id DESC;"))
          (loop rest
                (cons (db-get-build (string->number id)) builds)))))))
 
+(define %build-submission-timeout
+  ;; Maximum number of second a build can be in "submitted" state before being
+  ;; switched back to "scheduled".
+  (* 30 60))
+
 (define (db-remove-unresponsive-workers timeout)
   "Remove the workers that are unresponsive since at least TIMEOUT seconds.
 Also restart the builds that are started on those workers."
@@ -2243,6 +2248,15 @@ WHERE status = -1 AND
         (log-info "restarted ~a builds that were on unresponsive workers"
                   restarted)))
 
+    (let ((rescheduled (exec-query/bind db "UPDATE Builds
+SET status = " (build-status scheduled) "
+WHERE status = " (build-status submitted) " AND
+(extract(epoch from now())::int - starttime) > " %build-submission-timeout
+";")))
+      (unless (zero? rescheduled)
+        (log-info "rescheduled ~a builds that were submitted more than ~as ago"
+                  rescheduled %build-submission-timeout)))
+
     (let ((removed (exec-query/bind db "DELETE FROM Workers WHERE
 (extract(epoch from now())::int - last_seen) > " timeout ";")))
       (unless (zero? removed)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]