bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#34033: Offloading sometimes hangs


From: Ludovic Courtès
Subject: bug#34033: Offloading sometimes hangs
Date: Thu, 10 Jan 2019 17:09:31 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Hello,

So there’s another situation where offloading regularly hangs on
berlin.  The ‘guix offload’ process looks like this:

--8<---------------cut here---------------start------------->8---
(gdb) bt
#0  0x00007f1f715686a1 in __GI___poll (fds=0x14e9b30, nfds=1, timeout=-1) at 
../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f1f673b94e7 in ssh_poll (timeout=<optimized out>, nfds=<optimized 
out>, fds=<optimized out>)
    at /tmp/guix-build-libssh-0.7.7.drv-0/libssh-0.7.7-checkout/src/poll.c:98
#2  ssh_poll_ctx_dopoll (address@hidden, address@hidden)
    at /tmp/guix-build-libssh-0.7.7.drv-0/libssh-0.7.7-checkout/src/poll.c:612
#3  0x00007f1f673ba449 in ssh_handle_packets (address@hidden, address@hidden)
    at 
/tmp/guix-build-libssh-0.7.7.drv-0/libssh-0.7.7-checkout/src/session.c:634
#4  0x00007f1f673ba51d in ssh_handle_packets_termination (address@hidden, 
timeout=<optimized out>,
    address@hidden, address@hidden <ssh_channel_read_termination>, 
address@hidden)
    at 
/tmp/guix-build-libssh-0.7.7.drv-0/libssh-0.7.7-checkout/src/session.c:696
#5  0x00007f1f673a6aaf in ssh_channel_read_timeout (channel=0x224e360, 
address@hidden,
    address@hidden, is_stderr=<optimized out>, timeout=-3, address@hidden)
    at 
/tmp/guix-build-libssh-0.7.7.drv-0/libssh-0.7.7-checkout/src/channels.c:2705
#6  0x00007f1f673a6bbb in ssh_channel_read (channel=<optimized out>, 
address@hidden, address@hidden,
    is_stderr=<optimized out>) at 
/tmp/guix-build-libssh-0.7.7.drv-0/libssh-0.7.7-checkout/src/channels.c:2621
#7  0x00007f1f67413a23 in read_from_channel_port (
    channel=<error reading variable: ERROR: In procedure 
gdbscm_memory_port_fill_input: error reading memory>0x22f01a0, dst=<optimized 
out>, start=0, count=8) at channel-type.c:161
#8  0x00007f1f71b65287 in scm_i_read_bytes (
    address@hidden<error reading variable: ERROR: In procedure 
gdbscm_memory_port_fill_input: error reading memory>0x22f01a0, 
address@hidden"#<vu8vector>" = {...}, address@hidden, address@hidden) at 
ports.c:1559
#9  0x00007f1f71b6996c in scm_c_read_bytes (
    address@hidden<error reading variable: ERROR: In procedure 
gdbscm_memory_port_fill_input: error reading memory>0x22f01a0, 
address@hidden"#<vu8vector>" = {...}, address@hidden, address@hidden) at 
ports.c:1639
#10 0x00007f1f71b6fd80 in scm_get_bytevector_n (
    port=<error reading variable: ERROR: In procedure 
gdbscm_memory_port_fill_input: error reading memory>0x22f01a0,
    count=<optimized out>) at r6rs-ports.c:421
#11 0x00007f1f71ba4715 in vm_regular_engine (thread=0x14e9b30, vp=0xc31f30, 
registers=0xffffffff, resume=1901495969)
    at vm-engine.c:786

[...]

(gdb) p *fds
$1 = {fd = 15, events = 1, revents = 0}
(gdb) shell ls -l /proc/12185/fd
total 0
lr-x------ 1 root root 64 Jan 10 16:56 0 -> 'pipe:[76778016]'
l-wx------ 1 root root 64 Jan 10 16:56 1 -> 'pipe:[76778015]'
lr-x------ 1 root root 64 Jan 10 16:56 10 -> 'pipe:[76838317]'
l-wx------ 1 root root 64 Jan 10 16:56 11 -> 'pipe:[76838317]'
lr-x------ 1 root root 64 Jan 10 16:56 12 -> 'pipe:[76851360]'
l-wx------ 1 root root 64 Jan 10 16:56 13 -> 'pipe:[76851360]'
l-wx------ 1 root root 64 Jan 10 16:56 14 -> 
/var/guix/offload/overdrive1.guixsd.org/1
lrwx------ 1 root root 64 Jan 10 16:56 15 -> 'socket:[76860702]'
lr-x------ 1 root root 64 Jan 10 16:56 16 -> /dev/urandom
l-wx------ 1 root root 64 Jan 10 16:56 2 -> 'pipe:[76778015]'
lr-x------ 1 root root 64 Jan 10 16:56 3 -> 'pipe:[76838313]'
l-wx------ 1 root root 64 Jan 10 16:56 4 -> 'pipe:[76778017]'
l-wx------ 1 root root 64 Jan 10 16:56 5 -> 'pipe:[76838313]'
lr-x------ 1 root root 64 Jan 10 16:56 6 -> 'pipe:[76838316]'
l-wx------ 1 root root 64 Jan 10 16:56 7 -> 'pipe:[76838316]'
lr-x------ 1 root root 64 Jan 10 16:56 8 -> 'pipe:[76841414]'
l-wx------ 1 root root 64 Jan 10 16:56 9 -> 'pipe:[76841414]'
--8<---------------cut here---------------end--------------->8---

It’s a ‘get-bytevector-n’ for 8 bytes, so it looks like the daemon
protocol.  At that point the socket is actually dead: if I connect on
the remote machine (overdrive1.guixsd.org) I can see that there are no
other open SSH sessions.

A simple thing would be to somehow get libssh to pass POLLIN | POLLRDHUP
instead of just POLLIN.

Additionally, we could change Guile-SSH so that we can specify a timeout
when reading from a channel.

Ludo’.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]