[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services
From: |
Ludovic Courtès |
Subject: |
[bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services |
Date: |
Fri, 02 Mar 2018 10:44:12 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) |
Hi Carlo,
Carlo Zancanaro <address@hidden> skribis:
> On Wed, Feb 28 2018, Ludovic Courtès wrote:
>>> The problem is that shepherd, when run as a user process, can
>>> "lose"
>>> services which fork away. Shepherd can still kill them, but a
>>> SIGCHLD
>>> won't be delivered if they die, so shepherd can't restart/disable
>>> them. My prime example is emacs, which I run with --daemon. If I
>>> then
>>> kill emacs, shepherd will still think that it is running.
>>
>> There are two issues here, I think.
>>
>> 1. shepherd cannot lose SIGCHLD: if a process dies immediately
>> once
>> it’s been spawned, as is the case with “emacs --daemon” or
>> any
>> other daemon-style program, it should receive SIGCHLD and
>> process
>> it.
>
> Yeah, that's true, but the problem is that shepherd only processes the
> SIGCHLD if there is a service with its `running` slot set to the
> pid.
Well, it does call ‘waitpid’ every time it gets a SIGCHLD, but it’s true
that it doesn’t do anything beyond that if it doesn’t know what service
a PID corresponds to.
> When emacs forks, the original process may have its SIGCHLD handled,
> but that doesn't affect shepherd's service state (as it shouldn't,
> because it's using #:pid-file to track the forked process).
>
>> 2. shepherd currently can’t do much with real daemons. So what
>> we do
>> in GuixSD is to either start programs in non-daemon mode,
>> when
>> that’s an option, or pass #:pid-file to retrieve the forked
>> process
>> PID. I think you should do one of these as well.
>
> I am doing that. The problem is that when a service dies (crashes,
> quits, etc.) the `respawn?` option cannot be honoured because shepherd
> is not notified that the process has terminated (because it never
> receives a SIGCHLD for the forked pid). My patch polls for the
> processes we expect, to make up for the lack of notification.
I see.
Actually, thinking more about it, we should be using
PR_SET_CHILD_SUBREAPER from prctl(2), which is designed exactly for
that.
So what about this plan:
1. Add FFI bindings in (shepherd system) for prctl(2). We should
arrange for it to throw to 'system-error when the ‘prctl’ symbol is
missing, as is the case on GNU/Hurd.
2. Use prctl/PR_SET_CHILD_SUBREAPER in ‘exec-command’. Here we must
‘catch-system-error’ around that call to cater to GNU/Hurd.
That would address the main issue without having to resort to polling.
Respawning will work only when #:pid-file is used though, but that’s
already an improvement.
Thoughts?
Thanks,
Ludo’.
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services, Carlo Zancanaro, 2018/03/01
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services,
Ludovic Courtès <=
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services, Carlo Zancanaro, 2018/03/02
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services, Ludovic Courtès, 2018/03/02
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services, Carlo Zancanaro, 2018/03/03
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services, Ludovic Courtès, 2018/03/03
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services, Carlo Zancanaro, 2018/03/03
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services, Ludovic Courtès, 2018/03/04
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services, Carlo Zancanaro, 2018/03/04
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services, Ludovic Courtès, 2018/03/04
- [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services, Carlo Zancanaro, 2018/03/04
- bug#30637: [WIP] shepherd: Poll every 0.5s to find dead forked services, Ludovic Courtès, 2018/03/05