[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#56674: [Shepherd] Use of ‘waitpid’, ‘system*’, etc. in service cod
From: |
Ludovic Courtès |
Subject: |
bug#56674: [Shepherd] Use of ‘waitpid’, ‘system*’, etc. in service code can cause deadlocks |
Date: |
Wed, 20 Jul 2022 23:39:08 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/28.1 (gnu/linux) |
Hi!
We’ve just had a bad experience with the nginx service on berlin, where
‘herd restart nginx’ would cause shepherd to get stuck forever in
‘waitpid’ on the process that was supposed to start nginx.
The details are unclear, but one thing is clear is that using ‘waitpid’
(either directly or indirectly with ‘system*’, which is what
‘nginx-service-type’ does) is not great:
1. In the best case, shepherd (as of 0.9.1) is stuck while ‘system*’
is in ‘waitpid’ waiting for child process completion (“stuck” as
in: doesn’t do anything, not even answering ‘herd’ requests or
inetd connections.)
2. I don’t think that can happen with ‘system*’ (because it’s in C),
but generally speaking, there’s a possibility that shepherd’s event
loop will handle child process termination before some other
user-made ‘waitpid’ call does.
Anyway, that’s a bad situation.
So I can think of several ways to address it:
1. Change the nginx service ‘stop’ method to just
(make-kill-destructor), which should work just as well as invoking
“nginx -s stop”.
2. Have Shepherd provide a replacement for ‘system*’.
Thoughts?
Ludo’.
- bug#56674: [Shepherd] Use of ‘waitpid’, ‘system*’, etc. in service code can cause deadlocks,
Ludovic Courtès <=