help-cfengine
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SIGPIPE problem


From: Yaroslav Halchenko
Subject: SIGPIPE problem
Date: Tue, 20 Sep 2005 17:32:07 -0400
User-agent: mutt-ng devel-20050619 (Debian)

Dear Cfengineers,

Such a problem seems started to occur when we extended the cluster to 25
nodes: 
/var/lib/cfengine2/bin/cfagent -Dfrom_cfexecd
hangs and I guess that is causing next message to be sent via email

cfengine:node5: Received signal 13 (SIGPIPE) while doing 
[lock.cfagent_conf.node5.files._var_spool_torque_spool_3777_4000__1_1094]
cfengine:node5: Logical start time Tue Sep 20 16:42:23 2005
cfengine:node5: This sub-task started really at Tue Sep 20 16:42:23 2005

Hanging process has PID 29349, thus here is some diagnostics:

gdb attached to the process gives backtrace of 758 function calls, which
look like

#0  0xb7d0d2cb in nanosleep () from /lib/tls/libc.so.6
#1  0xb7d0d110 in sleep () from /lib/tls/libc.so.6
#2  0x0804d163 in ?? ()
#3  0x000009d1 in ?? ()
#4  0x00000000 in ?? ()
#5  0x0000000a in ?? ()
#6  0x00000000 in ?? ()
#7  0x08130cc0 in optarg ()
#8  0x00000000 in ?? ()
.....
#747 0xb7ca6e55 in getenv () from /lib/tls/libc.so.6
#748 0x0804b2d4 in ?? ()
#749 0x08120860 in optarg ()
#750 0x080a85f8 in _IO_stdin_used ()
#751 0x00000000 in ?? ()
#752 0xb7dadff4 in ?? () from /lib/tls/libc.so.6
#753 0x00000000 in ?? ()
#754 0xb7dadff4 in ?? () from /lib/tls/libc.so.6
#755 0xbffffe28 in ?? ()
#756 0xb7c8fec0 in __libc_start_main () from /lib/tls/libc.so.6
#757 0xb7c8fec0 in __libc_start_main () from /lib/tls/libc.so.6
#758 0x0804b0a1 in ?? ()

In the logs on the node I see:

cfengine.node5.runlog:Tue Sep 20 16:30:25 2005:Lock removed normally 
:pid=29349:lock.cfagent_conf.node5.copy._etc_cfengine_inputs___var_lib_cfengine2_inputs_ravana_3343:
cfengine.node5.runlog:Tue Sep 20 16:30:26 2005:Lock removed normally 
:pid=29349:lock.cfagent_conf.node5.tidy._var_lib_cfengine2_outputs_3023:
cfengine.node5.runlog:Tue Sep 20 17:16:47 2005:Lock removed normally 
:pid=29349:lock.cfagent_conf.node5.disks.__3249:
cfengine.node5.runlog:Tue Sep 20 17:16:48 2005:Lock removed normally 
:pid=29349:lock.cfagent_conf.node5.disks._usr_2286:
cfengine.node5.runlog:Tue Sep 20 17:16:48 2005:Lock removed normally 
:pid=29349:lock.cfagent_conf.node5.disks._var_4909:

I kept attached in gdb since approx 17:00, and then the process
"completed" as soon as I detached

Where should I look to reveal the source of the  SIGPIPE message?
I'm running Debian unstable with 2.1.15-1 of cfengine

I have splaytime to be 50 and maxconnections 40

Thank you in advance for the hints
-- 
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07105
Student  Ph.D. @ CS Dept. NJIT




reply via email to

[Prev in Thread] Current Thread [Next in Thread]