|
From: | Rick Masters |
Subject: | GNU Parallel Bug Reports Signal SIGCHLD received, but no signal handler set |
Date: | Tue, 11 Oct 2016 19:41:45 +0000 |
I believe I have isolated an issue reported here recently and elsewhere and I would like to suggest a solution. http://lists.gnu.org/archive/html/parallel/2016-07/msg00011.html http://stackoverflow.com/questions/39754323/how-to-avoid-sigchld-error-in-bash-script-that-uses-gnu-parallel The problem does indeed appear to be a result of the line: delete $SIG{CHLD}; The line above appears to be functionally equivalent [1] to: $SIG{CHLD}="DEFAULT"; But the first line has a problem while the second does not. -- [1] I ran them under strace and the sigaction calls are the same. Also, parallel appears to behave the same way with either line. In contrast, this does not work for me: $SIG{CHLD}="IGNORE"; When I try that, parallel seems to stall. Also, the strace sigaction calls looks different than the other two variations. Also, this discussion seems to indicate that IGNORE is an alternative
to doing waitpid manually (and parallel does use waitpid): http://www.perlmonks.org/?node_id=1047688 -- The cause of the problem, which may only affect older versions of perl, appears to be that the delete command is not resilient to receiving multiple signals in a short period of time. Somehow perl can receive a signal at a time when it does not know what to do with it because the pointer it keeps internally to process the signal is null, so it aborts. This is from the source code of perl: if (!PL_psig_ptr[sig]) { PerlIO_printf(Perl_error_log, "Signal SIG%s received, but no signal handler set.\n", PL_sig_name[sig]); exit(sig); } This code was probably added to defend against unsolved signal handling bugs that would otherwise crash perl, like this report from long ago: http://markmail.org/thread/da7bde4lmcmh2h3b Recent versions of perl do not seem to be vulnerable to the problem. I can reproduce this with perl 5.10.1 on centos 6. I cannot reproduce this with perl 5.16.3 on centos 7. Unfortunately, I'm stuck supporting the old version for years to come. It would be greatly appreciated if you could change the line in question for your next version. I think changing the code to assign the handler to a specific documented value not only fixes the problem, but improves clarity and appears to be better supported by perl. Here are the details for how this can be reproduced. With parallel, I use the following test command, which is designed to end precisely when the time advances to the next second. This allows multiple instances of this script to end roughly at the same time: ~bash$ parallel --version GNU parallel 20160922 <snip> ~bash$ cat sleepshort #!/bin/bash sleep 1 foo="$(date)" while [ "$(date)" = "$foo" ]; do printf "" done Then, many of these are launched at the same time: ~bash$ parallel --jobs $((20 + $RANDOM % 50)) -D run -v ./sleepshort ::: {1..200} The problem may not happen on the first attempt, so this can be used: # while parallel --jobs $((20 + $RANDOM % 50)) -D run -v ./sleepshort ::: {1..200}; do true; done For me, this only takes several attempts (within a few minutes) to reproduce the problem. Also, there is a more direct way of reproducing the underlying perl issue. Run this perl script on one terminal to handle signals and run the second command on another terminal to send signals to the first. The second script depends on the names of this script, so call this "sigtest.pl": #!/usr/bin/perl while (1) { print "adding handler\n"; $SIG{CHLD} = sub { print "gotchild\n"; }; print "deleting handler\n"; delete $SIG{CHLD}; } Run it: ~bash$ ./sigtest.pl Then, run this bash script on another terminal: #!/bin/bash pid=$(ps x | grep sigtest |grep -v grep |awk '{print $1}') while kill -SIGCHLD $pid; do true done This instantly reproduces the problem for me. The first command exits with: Signal SIGCHLD received, but no signal handler set But if I replace the delete as suggested, the problem does not occur. Please let me know if you need any more information and thank you for the work you do on parallel. Regards, Rick Masters F5 Networks |
[Prev in Thread] | Current Thread | [Next in Thread] |