[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services
From: |
Ludovic Courtès |
Subject: |
[bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services |
Date: |
Sun, 04 Mar 2018 23:11:28 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) |
Hello,
Carlo Zancanaro <address@hidden> skribis:
> On Sat, Mar 03 2018, Ludovic Courtès wrote:
>> If they’re zombies, that means nobody called waitpid(2). Presumably
>> the
>> polling code would need to do that.
>>
>> So I suppose ‘check-for-dead-services’ should do something like:
>>
>> [ ... ]
>>
>> Does that make sense? Please check waitpid(2) carefully though,
>> because
>> it’s pretty gnarly and I might have forgotten or misinterpreted
>> something here.
>
> Unfortunately we can't do that. We fall back to the polling approach
> to handle the fact that the processes that we care about aren't our
> children (hence we don't get SIGCHLD). The waitpid system call only
> waits for processes which are children of the calling process.
Oh right!
> I looked into the zombie problem a bit more, and I found what the
> problem actually is. In the build environment a guile process is
> running as pid 1 (the *-guile-builder script for that job). This guile
> process never handles SIGCHLD, and never calls wait/waitpid, so any
> orphaned processes become zombies. I tried modifying derivations.scm,
> but it wanted to rebuild a lot of things, so I gave up. I think we
> need to add something like this to the *-guile-builder script:
>
> (sigaction SIGCHLD
> (lambda ()
> (let loop ()
> (match (waitpid WAIT_ANY WNOHANG)
> ((0 . _) #f)
> ((pid . _) (loop))
> (_ #f))))
> SA_NOCLDSTOP)
Good catch. We could add this in gnu-build-system.scm in core-updates,
though it’s no big deal anyway since these are throw-away environments.
Thoughts?
> From e529e4035eec147f448804dd10fdbca13a17f523 Mon Sep 17 00:00:00 2001
> From: Carlo Zancanaro <address@hidden>
> Date: Sun, 4 Mar 2018 07:01:30 +1100
> Subject: [PATCH 1/3] Add prctl syscall wrapper along with with
> PR_SET_CHILD_SUBREAPER.
>
> * configure.ac: Detect and substitute PR_SET_CHILD_SUBREAPER.
> * modules/shepherd/system.scm.in (PR_SET_CHILD_SUBREAPER): Add new variable
> and export it.
> (prctl): Add new procedure and export it.
Applied with a copyright line for you.
> From b43c128d8a175a9a123eb7b1af465fb3747a5393 Mon Sep 17 00:00:00 2001
> From: Carlo Zancanaro <address@hidden>
> Date: Sun, 4 Mar 2018 07:46:13 +1100
> Subject: [PATCH 2/3] Handle forked process SIGCHLD signals
>
> * Makefile.am (TESTS): Add tests/forking-service.sh.
> * modules/shepherd.scm: Set the child subreaper attribute of main shepherd
> process (as long as we're not pid 1).
> * modules/shepherd/service.scm (root-service)[daemonize]: Set the child
> subreaper attribute of newly forked shepherd process.
Applied. However tests/forking-service.sh was missing, so I took it
from the previous version of this patch and added it. I also changed
the “Bashishms” that were in that file, as discussed earlier. Let me
know if anything’s wrong!
> From 3d3c091660bbbd529af0058b0ba9b5ddbfc6b481 Mon Sep 17 00:00:00 2001
> From: Carlo Zancanaro <address@hidden>
> Date: Wed, 21 Feb 2018 22:57:59 +1100
> Subject: [PATCH 3/3] Poll every 0.5s to find dead forked services
>
> * modules/shepherd.scm (open-server-socket): Set socket to be
> non-blocking.
> (main): Use select with a timeout. If prctl failed when shepherd started
> then call check-for-dead-services between connections/timeouts.
> * modules/shepherd/service.scm (fork+exec-command): Install handle-SIGCHLD as
> signal handler.
> (respawn-service): Separate logic for respawning services from handling
> SIGCHLD.
> (handle-SIGCHLD, check-for-dead-services): New exported procedures.
> * tests/basic.sh, tests/status-sexp.sh: Replace constant integers with
> symbols.
> * doc/shepherd.texi (Slots of services): Add note about service running slot
> being a process id.
[...]
> --- a/modules/shepherd.scm
> +++ b/modules/shepherd.scm
> @@ -42,6 +42,8 @@
> (with-fluids ((%default-port-encoding "UTF-8"))
> (let ((sock (socket PF_UNIX SOCK_STREAM 0))
> (address (make-socket-address AF_UNIX file-name)))
> + (fcntl sock F_SETFL (logior O_NONBLOCK
> + (fcntl sock F_GETFL)))
> (bind sock address)
> (listen sock 10)
> sock)))
> @@ -49,14 +51,28 @@
>
> ;; Main program.
> (define (main . args)
> - (initialize-cli)
> + (define poll-services
> + (if (= 1 (getpid))
> + (lambda () #f) ;; If we're pid 1 then we don't need to set
> + ;; PR_SET_CHILD_SUBREAPER
[...]
> + (match (select (list sock) (list) (list) 0.5)
> + (((sock) _ _)
> + (read-from sock))
> + (_
> + #f))
> + (poll-services)
Here everyone ends up paying some overhead (the 0.5 second timeout),
which isn’t great.
How about something like:
(define poll-services
(and (not (= 1 (getpid)))
…))
(match (select (list sock) '() '() (if poll-services 0.5 0))
…)
?
Thanks,
Ludo’.
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services, Carlo Zancanaro, 2018/03/01
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services, Ludovic Courtès, 2018/03/02
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services, Carlo Zancanaro, 2018/03/02
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services, Ludovic Courtès, 2018/03/02
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services, Carlo Zancanaro, 2018/03/03
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services, Ludovic Courtès, 2018/03/03
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services, Carlo Zancanaro, 2018/03/03
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services,
Ludovic Courtès <=
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services, Carlo Zancanaro, 2018/03/04
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services, Ludovic Courtès, 2018/03/04
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services, Carlo Zancanaro, 2018/03/04
- bug#30637: [WIP] shepherd: Poll every 0.5s to find dead forked services, Ludovic Courtès, 2018/03/05