bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#62619: Shepherd desertion upon service canonical name change?


From: Ludovic Courtès
Subject: bug#62619: Shepherd desertion upon service canonical name change?
Date: Thu, 27 Apr 2023 14:55:38 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)

Hi Bruno,

Bruno Victal <mirai@makinata.eu> skribis:

> Upon a guix system reconfigure, if a running service has undergone a 
> canonical name change since the previous generation
> then shutdown or reboot commands fail, with shepherd indicating itself (root 
> service) as stopped.

Oh, fun.

> mirai@guix-nuc ~$ sudo reboot

[…]

> Service xvnc has been stopped.
> Service git-daemon has been stopped.
> Assertion (eq? (canonical-name new) (canonical-name old)) failed.
> assertion-failed()
> Service root has been stopped.

I think this assertion failure is the root cause.  It comes from
‘register-services’ in 0.9.3, which reads:

--8<---------------cut here---------------start------------->8---
(define (register-services . new-services)
  "Add NEW-SERVICES to the list of known services.  If a service has already
been registered, arrange to have it replaced when it is next stopped.  If it
is currently stopped, replace it immediately."
  (define (register-single-service new)
    ;; Sanity-checks first.
    (assert (list-of-symbols? (provided-by new)))
    (assert (list-of-symbols? (required-by new)))
    (assert (boolean? (respawn? new)))

    ;; FIXME: Just because we have a unique canonical name now doesn't mean it
    ;; will remain unique as other services are added. Whenever a service is
    ;; added it should check that it's not conflicting with any already
    ;; registered canonical names.
    (match (lookup-services (canonical-name new))
      (() ;; empty, so we can safely add ourselves
       (for-each (lambda (name)
                   (let ((old (lookup-services name)))
                     (hashq-set! %services name (cons new old))))
                 (provided-by new)))
      ((old . rest) ;; one service registered, it may be an old version of us
       (assert (null? rest))
       (assert (eq? (canonical-name new) (canonical-name old)))
       (if (running? old)
           (slot-set! old 'replacement new)
           (replace-service old new)))))

  (for-each register-single-service new-services))
--8<---------------cut here---------------end--------------->8---

‘register-services’ was called from ‘replace-service’, itself called
from ‘stop’ (right after ‘networking’ had been actually stopped):

--8<---------------cut here---------------start------------->8---
        ;; SERVICE is no longer running.
        (slot-set! service 'running #f)

        ;; Reset the list of respawns.
        (slot-set! service 'last-respawns '())

        ;; Replace the service with its replacement, if it has one
        (let ((replacement (slot-ref service 'replacement)))
          (when replacement
            (replace-service service replacement))) ;<- here
--8<---------------cut here---------------end--------------->8---

The assertion failure was thrown here.  ‘stop’ calls itself
recursively but it doesn’t catch exceptions from recursive calls:

--8<---------------cut here---------------start------------->8---
  (fold-services (lambda (other acc)
                   (if (and (running? other)
                            (required-by? service other))
                       (append (stop other) acc)   ;<- here
                       acc))
                 '())
--8<---------------cut here---------------end--------------->8---

The problem is that ‘root-service’ marks itself as stopped before it has
effectively shut down the services:

--8<---------------cut here---------------start------------->8---
    #:stop (lambda (unused . args)
             (local-output (l10n "Exiting shepherd..."))
             ;; Prevent that we try to stop ourself again.
             (slot-set! root-service 'running #f)
             (shutdown-services)
             (quit))
--8<---------------cut here---------------end--------------->8---

So what happened is that ‘shutdown-services’ threw, the exception wasn’t
caught, and thus it never called ‘quit’.  QED.


The service registry in the soon-to-be-released 0.10.0 no longer has
that assertion failure (it vanished in
08510a2a2aaab388c90dd402bd7506d33014454f).  Instead, it registers a
replacement for the first service found to have one of the names of the
new service.

The problem of the #:stop method of ‘root-service’ still exists: if
‘shutdown-services’ throws, the ‘root’ service won’t terminate and it’ll
remain in ‘stopping’ status.  We could wrap the ‘shutdown-services’ call
in ‘false-if-exception’, but I’d lean towards not doing it: it’s not
supposed to throw, so maybe it’s best not to swallow the exception.

To summarize, I believe the problem is solved in 0.10.

Thanks,
Ludo’.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]