bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#58485: [shepherd] Restarting guix-publish fails


From: Ludovic Courtès
Subject: bug#58485: [shepherd] Restarting guix-publish fails
Date: Thu, 27 Apr 2023 23:23:58 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)

Hi,

Sorry for the late reply.  I’m going through Shepherd bug reports and I
remembered this discussion…

Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis:

>> Can you confirm shepherd (PID 1) is 0.9.3?
> it is:
>
> root         1  0.2  0.2 308148 76816 ?        Sl   Feb07  52:08 
> /gnu/store/kphp5d85rrb3q1rdc2lfqc1mdklwh3qp-guile-3.0.9/bin/guile 
> --no-auto-compile 
> /gnu/store/4nw0zb4swga0cb8i35nvng3rg6z5qm8p-shepherd-0.9.3/bin/shepherd 
> --config /gnu/store/cvrai6z8777jf7860rnvppfznl1lcxi1-shepherd.conf
>
>> ‘sudo herd restart ssh-daemon’ works fine on my laptop FWIW.
> This works fine too. Only unattended-upgrades seems to have this issue :/
>
> The strace looks unsuspicious right now:
>
> ---snip---
> 1     14:12:15.117035 read(21, "(shepherd-command (version 0) (action 
> restart) (service ssh-daemon) (arguments ()) (directory \"/root\"))", 1024) = 
> 103
> 1     14:12:15.117254 close(27)         = 0
> 1     14:12:15.117283 close(30)         = 0
> 1     14:12:15.117416 newfstatat(AT_FDCWD, "/etc/localtime", 
> {st_dev=makedev(0x8, 0x2), st_ino=110100491, st_mode=S_IFREG|0444, 
> st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, s
> t_size=2298, st_atime=1676898665 /* 2023-02-20T14:11:05.338746772+0100 */, 
> st_atime_nsec=338746772, st_mtime=1676898664 /* 
> 2023-02-20T14:11:04.874743456+0100 */, st_mtime_nsec=874743456, st_c
> time=1676898664 /* 2023-02-20T14:11:04.874743456+0100 */, 
> st_ctime_nsec=874743456}, 0) = 0
> 1     14:12:15.117475 write(17, "shepherd[1]: Service ssh-daemon has been 
> stopped.\n", 50) = 50
> 1     14:12:15.117524 socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 
> IPPROTO_IP) = 26
> 1     14:12:15.117561 setsockopt(26, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> 1     14:12:15.117598 bind(26, {sa_family=AF_INET, sin_port=htons(2222), 
> sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EADDRINUSE (Address already in use)
> 1     14:12:15.117724 write(21, "(reply (version 0) (result #f) (error (error 
> (version 0) action-exception start ssh-daemon system-error (\"bind\" \"~A\" 
> (\"Address already in use\") (98)))) (messages (\"Service ssh-daemon has been 
> stopped.\")))", 204) = 204
> 1     14:12:15.117754 close(21)         = 0

This suggests ‘bind’ can return EADDRINUSE even though the sockets have
been closed before (presumably file descriptors 27 and 30 above).

Can you confirm nothing else is competing to bind port 2222 on that
machine?

I tried to reproduce it with something as brutal as:

  while sudo herd restart sshd ; do : ; done

… to no avail (I’m on current Shepherd ‘master’ though).

Maybe we should just have shepherd retry upon EADDRINUSE (like nginx
does, as you wrote), though I’d like to understand under what conditions
we can get EADDRINUSE in the first place.

Ludo’.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]