guix-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[shepherd] 06/06: shepherd: Load config asynchronously and gracefully ha


From: Ludovic Courtès
Subject: [shepherd] 06/06: shepherd: Load config asynchronously and gracefully handle errors.
Date: Mon, 12 Jun 2023 09:39:21 -0400 (EDT)

civodul pushed a commit to branch master
in repository shepherd.

commit 24c964021ebd3d63ce6e22808dd09dbe16116a6c
Author: Ludovic Courtès <ludo@gnu.org>
AuthorDate: Mon Jun 12 12:10:37 2023 +0200

    shepherd: Load config asynchronously and gracefully handle errors.
    
    Fixes <https://issues.guix.gnu.org/63982>.
    Reported by Maxim Cournoyer <maxim.cournoyer@gmail.com>.
    
    * modules/shepherd.scm (exception-with-kind-and-args?)
    (configuration-file-loader): New procedures.
    (run-daemon): Spawn 'configuration-file-loader' in a fiber.
    (main): Remove "catch 'quit" around 'run-daemon' call: this was uncanny
    because the 'primitive-exit' calls meant processes were left behind.
    Additionally, exiting upon config file failures wasn't necessarily
    desirable as it would prevent users from inspecting and reloading an
    updated config file.
    * tests/config-failure.sh: New file.
    * Makefile.am (TESTS): Add it.
    * tests/logging.sh: Wait for 'test-file-logging' to be running.
    * tests/pid-file.sh: Wait for config file evaluation to complete.
    * doc/shepherd.texi (Invoking shepherd): Document it.
    * NEWS: Update.
---
 Makefile.am             |  1 +
 NEWS                    | 16 ++++++++
 doc/shepherd.texi       | 12 ++++++
 modules/shepherd.scm    | 99 +++++++++++++++++++++++++++++++++----------------
 tests/config-failure.sh | 77 ++++++++++++++++++++++++++++++++++++++
 tests/logging.sh        |  4 +-
 tests/pid-file.sh       |  9 ++++-
 7 files changed, 184 insertions(+), 34 deletions(-)

diff --git a/Makefile.am b/Makefile.am
index fc53739..871b78d 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -246,6 +246,7 @@ SUFFIXES = .go
 
 TESTS =                                                \
   tests/basic.sh                               \
+  tests/config-failure.sh                      \
   tests/starting-status.sh                     \
   tests/stopping-status.sh                     \
   tests/startup-failure.sh                     \
diff --git a/NEWS b/NEWS
index c839187..05a3a1a 100644
--- a/NEWS
+++ b/NEWS
@@ -13,6 +13,22 @@ Please send Shepherd bug reports to bug-guix@gnu.org.
 
 * Changes in 0.10.2
 
+** ‘shepherd’ loads configuration file asynchronously
+
+Up to 0.10.1, ‘shepherd’ would load the user-provided configuration file
+synchronously: it would write its PID file and start listening for incoming
+connections only after the configuration file has been loaded.  The
+configuration file is now loaded in the background, letting users interact
+with shepherd (using the ‘herd’ command) early on.
+
+** ‘shepherd’ keeps going upon configuration file errors
+   (<https://issues.guix.gnu.org/63982>)
+
+Up to 0.10.1, ‘shepherd’ would abruptly exit when an error would occur while
+loading the configuration file—service startup failure, uncaught exception,
+etc.  It now reports the error but keeps going, again letting users fix any
+problems dynamically.
+
 ** New #:respawn-limit parameter to ‘service’
 
 The ‘service’ form supports a new #:respawn-limit parameter to specify
diff --git a/doc/shepherd.texi b/doc/shepherd.texi
index dbd0756..75fcb49 100644
--- a/doc/shepherd.texi
+++ b/doc/shepherd.texi
@@ -419,6 +419,18 @@ starting some or all of those services with
 @xref{Service Examples}, for examples of what @var{file} might look
 like.
 
+Note that @var{file} is evaluated @emph{asynchronously}:
+@command{shepherd} may start listening for client connections (with the
+@command{herd} command) before @var{file} has been fully loaded.  Errors
+in @var{file} such as service startup failures or uncaught exceptions do
+@emph{not} cause @command{shepherd} to stop.  Instead the error is
+reported, leaving users the opportunity to inspect service state and,
+for example, to load an updated config file with:
+
+@example
+herd load root @var{file}
+@end example
+
 @item -I
 @itemx --insecure
 @cindex security
diff --git a/modules/shepherd.scm b/modules/shepherd.scm
index 7058e6b..1207964 100644
--- a/modules/shepherd.scm
+++ b/modules/shepherd.scm
@@ -139,6 +139,53 @@ already ~a threads running, disabling 'signalfd' support")
            (fcntl fd F_SETFD (logior FD_CLOEXEC flags)))))
       (loop (+ fd 1)))))
 
+(cond-expand
+ ((not guile-3)
+  (define exception-with-kind-and-args?
+    (const #f)))
+ (else
+  (define exception-with-kind-and-args?
+    (exception-predicate &exception-with-kind-and-args))))
+
+(define (configuration-file-loader file)
+  "Return a thunk that loads @var{file}, the user's configuration file."
+  (define (failure)
+    (report-error
+     (l10n "~s: exception thrown while loading configuration file~%")
+     file)
+    #f)
+
+  (define (handle-key-and-args-exception key args)
+    (match key
+      ('system-error
+       (local-output
+        (l10n "While loading configuration file '~a': ~a")
+        file (strerror (system-error-errno (cons key args)))))
+      (_
+       (local-output
+        (l10n "Uncaught exception while loading configuration file '~a': ~s")
+        file (cons key args)))))
+
+  (lambda ()
+    (guard (c ((action-runtime-error? c)
+               (local-output (l10n "action '~a' on service '~a' failed: ~s")
+                             (action-runtime-error-action c)
+                             (action-runtime-error-service c)
+                             (cons (action-runtime-error-key c)
+                                   (action-runtime-error-arguments c)))
+               (failure))
+              ((exception-with-kind-and-args? c)
+               (handle-key-and-args-exception
+                (exception-kind c) (exception-args c))
+               (failure)))
+      (catch #t
+        (lambda ()
+          (load-in-user-module file)
+          (local-output (l10n "Configuration successfully loaded from '~a'.")
+                        file))
+        (lambda (key . args)                      ;for Guile 2.2
+          (handle-key-and-args-exception key args))))))
+
 (define* (run-daemon #:key (config-file (default-config-file))
                      socket-file pid-file signal-port poll-services?)
   (define signal-handler
@@ -169,15 +216,10 @@ already ~a threads running, disabling 'signalfd' support")
   (spawn-fiber
    (essential-task-thunk 'signal-handler signal-handler))
 
-  ;; This _must_ succeed.  (We could also put the `catch' around
-  ;; `main', but it is often useful to get the backtrace, and
-  ;; `caught-error' does not do this yet.)
-  (catch #t
-    (lambda ()
-      (load-in-user-module (or config-file (default-config-file))))
-    (lambda (key . args)
-      (caught-error key args)
-      (quit 1)))
+  ;; Load CONFIG-FILE in another fiber.  If loading fails, report it but keep
+  ;; going: the user can use 'herd load root' with a new config file if
+  ;; needed.
+  (spawn-fiber (configuration-file-loader config-file))
 
   ;; Ignore SIGPIPE so that we don't die if a client closes the connection
   ;; prematurely.
@@ -399,29 +441,22 @@ fork in the child process."
              (register-services (list root-service))
              (start-service root-service)
 
-             (catch 'quit
-               (lambda ()
-                 (with-process-monitor
-                   ;; Replace the default 'system*' binding with one that
-                   ;; cooperates instead of blocking on 'waitpid'.  Replace
-                   ;; 'primitive-load' (in C as of 3.0.9) with one that does
-                   ;; not introduce a continuation barrier.
-                   (replace-core-bindings!
-                    (system* (lambda command
-                               (spawn-command command)))
-                    (system spawn-shell-command)
-                    (primitive-load primitive-load*))
-
-                   (run-daemon #:socket-file socket-file
-                               #:config-file config-file
-                               #:pid-file pid-file
-                               #:signal-port signal-port
-                               #:poll-services? poll-services?)))
-               (case-lambda
-                 ((key value . _)
-                  (primitive-exit value))
-                 ((key)
-                  (primitive-exit 0))))))
+             (with-process-monitor
+               ;; Replace the default 'system*' binding with one that
+               ;; cooperates instead of blocking on 'waitpid'.  Replace
+               ;; 'primitive-load' (in C as of 3.0.9) with one that does
+               ;; not introduce a continuation barrier.
+               (replace-core-bindings!
+                (system* (lambda command
+                           (spawn-command command)))
+                (system spawn-shell-command)
+                (primitive-load primitive-load*))
+
+               (run-daemon #:socket-file socket-file
+                           #:config-file (or config-file (default-config-file))
+                           #:pid-file pid-file
+                           #:signal-port signal-port
+                           #:poll-services? poll-services?))))
          #:parallelism 1                          ;don't create POSIX threads
          #:hz 0)))))       ;disable preemption, which would require POSIX 
threads
 
diff --git a/tests/config-failure.sh b/tests/config-failure.sh
new file mode 100644
index 0000000..1cea2ad
--- /dev/null
+++ b/tests/config-failure.sh
@@ -0,0 +1,77 @@
+# GNU Shepherd --- Test shepherd behavior when config file errors out.
+# Copyright © 2023 Ludovic Courtès <ludo@gnu.org>
+#
+# This file is part of the GNU Shepherd.
+#
+# The GNU Shepherd is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or (at
+# your option) any later version.
+#
+# The GNU Shepherd is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with the GNU Shepherd.  If not, see <http://www.gnu.org/licenses/>.
+
+shepherd --version
+herd --version
+
+socket="t-socket-$$"
+conf="t-conf-$$"
+confdir="t-confdir-$$"
+datadir="t-datadir-$$"
+log="t-log-$$"
+stamp="t-stamp-$$"
+pid="t-pid-$$"
+child_pid="t-child-pid-$$"
+
+herd="herd -s $socket"
+
+trap "cat $log || true; rm -f $socket $conf $stamp $log $child_pid;
+      test -f $pid && kill \`cat $pid\` || true; rm -f $pid" EXIT
+
+cat > "$conf" <<EOF
+(register-services
+ (list (service
+        '(succeeding)
+        #:start (make-forkexec-constructor
+                   '("$SHELL" "-c" "echo \$\$ > $PWD/$child_pid; exec sleep 
300"))
+        #:stop (make-kill-destructor)
+        #:respawn? #f)
+       (service
+        '(failing)
+         #:requirement '(succeeding)
+        #:start (lambda _
+                   (call-with-output-file "$stamp" (const #t))
+                   (error "faileddddd!"))
+        #:stop (const #f)
+        #:respawn? #f)))
+
+(start-service (lookup-service 'failing))
+EOF
+
+rm -f "$pid" "$stamp"
+shepherd -I -s "$socket" -c "$conf" -l "$log" --pid="$pid" &
+
+# The 'succeeding' service should be up and running.
+while ! test -f "$child_pid" ; do sleep 0.3 ; done
+kill -0 "$(cat "$child_pid")"
+
+# Wait till it's ready.
+while ! test -f "$pid" ; do sleep 0.3 ; done
+shepherd_pid="$(cat $pid)"
+
+# Then the 'failing' service should fail.
+while ! test -f "$stamp" ; do sleep 0.3 ; done
+
+# Despite the failure while loading $conf, shepherd must be up and running.
+$herd status failing | grep "stopped"
+$herd status succeeding | grep "running"
+
+$herd stop root
+
+while kill -0 "$shepherd_pid" ; do sleep 0.3 ; done
+if kill -0 "$(cat "$child_pid")"; then false; else true; fi
diff --git a/tests/logging.sh b/tests/logging.sh
index 95384d5..961e5b5 100644
--- a/tests/logging.sh
+++ b/tests/logging.sh
@@ -77,8 +77,10 @@ while ! test -f "$pid" ; do sleep 0.3 ; done
 
 shepherd_pid="`cat $pid`"
 
+while ! test -f "$service_pid" ; do sleep 0.3 ; done
+until $herd status test-file-logging | grep running; do sleep 1; done
+
 cat "$service_log"
-$herd status test-file-logging | grep running
 for message in "STARTING" "STARTED" "café" "latin1 garbage: .* alors"
 do
     grep -E '^2[0-9]{3}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2} 
'"$message" "$service_log"
diff --git a/tests/pid-file.sh b/tests/pid-file.sh
index b2d5bec..55c02fd 100644
--- a/tests/pid-file.sh
+++ b/tests/pid-file.sh
@@ -23,11 +23,12 @@ socket="t-socket-$$"
 conf="t-conf-$$"
 log="t-log-$$"
 pid="t-pid-$$"
+stamp="t-stamp-$$"
 service_pid="t-service-pid-$$"
 
 herd="herd -s $socket"
 
-trap "cat $log || true; rm -f $socket $conf $service_pid $log;
+trap "cat $log || true; rm -f $socket $conf $service_pid $stamp $log;
       test -f $pid && kill \`cat $pid\` || true; rm -f $pid" EXIT
 
 cat > "$conf"<<EOF
@@ -97,6 +98,9 @@ cat > "$conf"<<EOF
 ;; Start it upfront.  This ensures the whole machinery works even
 ;; when called in a non-suspendable context (continuation barrier).
 (start-service (lookup-service 'test-works))
+
+;; Note that the config file has been evaluated.
+(call-with-output-file "$PWD/$stamp" (const #t))
 EOF
 
 rm -f "$pid"
@@ -107,6 +111,9 @@ while ! test -f "$pid" ; do sleep 0.3 ; done
 
 shepherd_pid="`cat $pid`"
 
+# The config file is evaluated asynchronously, so wait until it's been loaded.
+until test -f "$stamp" ; do sleep 0.3 ; done
+
 # This service should already be running.
 $herd status test-works | grep running
 test -f "$service_pid"



reply via email to

[Prev in Thread] Current Thread [Next in Thread]