help-cfengine
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Cfengine daemons keep dying!!!


From: Adam M. Dunn
Subject: Re: Cfengine daemons keep dying!!!
Date: Tue, 23 Nov 2004 16:03:37 -0600 (CST)


Guys, I've found a solution to my problem.  Apparently others are having
similar issues.


My Problem:  cfengine executables (in my case cfexecd) suddenly seg
faulting (dying) for no apparent reason.  Core files do not help, and
crash points seem to be random.

Solution:  Try recompiling with a different GCC version if you can.  The
one I used that was failing is listed below.



~Adam


---------- Forwarded message ----------
Date: Tue, 23 Nov 2004 22:08:08 +0100 (MET)
From: Mark.Burgess@iu.hio.no
To: adunn@hgsc.bcm.tmc.edu
Subject: Re: Cfengine daemons keep dying!!!


Thank you -- would you post this to the help list? Several people
seem to be having a problem...

M

On 23 Nov, Adam M. Dunn wrote:
> 
> Hi Mark,
> 
>   I still didn't find the root cause of the problem, but I got it to work
> none the less and thought I would fill you in how.  It turned out to be
> related to the compiler I was using:
> 
> 
> (/) neptunium# uname -a
> SunOS neptunium 5.8 Generic_108528-22 sun4u sparc SUNW,Ultra-80
> 
> (/) neptunium# gcc -v
> Reading specs from
> /hgsc/gnu/bin/../lib/gcc-lib/sparc-sun-solaris2.8/3.3.1/specs
> Configured with: ../gcc-3.3.1/configure --host=sparc-sun-solaris2.8
> --prefix=/home/share/gnu --exec-prefix=/home/gsc/gnu
> --with-as=/usr/ccs/bin/as --with-ld=/usr/ccs/bin/ld --enable-threads=posix
> --enable-languages=c,c++,f77,java
> Thread model: posix
> gcc version 3.3.1
> 
> 
> I don't know if it was the compiler itself, or compiler config options.  I
> tried recompiling with an old version of 2.95 we had still installed and
> cfexecd stopped seq faulting.  I'll probably get the newest GCC installed
> here and recompile with that later.
> 
> Thanks for your help though.
> 
> 
> 
> ~Adam
> 
> 
> 
> On Mon, 22 Nov 2004 Mark.Burgess@iu.hio.no wrote:
> 
>> 
>> Very odd. It doesn't make much sense. Try commenting out the sleep,
>> or putting some printfs into the code to see exactly where it fails
>> 
>> M
>> 
>> M
>> 
>> On 22 Nov, Adam M. Dunn wrote:
>> > 
>> > I'm starting to think it seems like `cfexecd' is dying while it sleeps,
>> > that would explain why everything else works except when running as a
>> > daemon.
>> > 
>> > # cfexecd -d2
>> > ...
>> > Sleeping...
>> > Segmentation Fault (core dumped)
>> > 
>> > 
>> > 
>> > ~adam
>> > 
>> > 
>> > 
>> > On Mon, 22 Nov 2004 Mark.Burgess@iu.hio.no wrote:
>> > 
>> >> 
>> >> Vcab you reproduce the problem in this mode? I cannot see where the
>> >> crash is occurring
>> >> 
>> >> M
>> >> 
>> >> On 22 Nov, Adam M. Dunn wrote:
>> >> > 
>> >> > No problem:
>> >> > 
>> >> > # cfexec -d2
>> >> > ...
>> >> > ...
>> >> > ...
>> >> > GNU autoconf class from compile time: compiled_on_solaris2_8
>> >> >  
>> >> > Address given by nameserver: 128.249.42.234
>> >> > Adding alias neptunium..
>> >> > AddClassToHeap(neptunium)
>> >> > Adding alias neptunium.bcm.tmc.edu..
>> >> > AddClassToHeap(neptunium_bcm_tmc_edu)
>> >> > Appending [neptunium_bcm_tmc_edu]
>> >> > ---------------------------------------------------------------------
>> >> > Starting server
>> >> > ---------------------------------------------------------------------
>> >> >  
>> >> > GetLock(cfexecd,execd,time=1101147535), ExpireAfter=0, IfElapsed=0
>> >> > GetLastLock()
>> >> > CheckOldLock(lock..neptunium.execd.execd_1243)
>> >> > Lock lock..neptunium.execd.execd_1243 last ran at Mon Nov 22 12:03:50 
>> >> > 2004
>> >> >  
>> >> > cfexecd: Lock lock..neptunium.execd.execd_1243 expired...(after 15/0 
>> >> > minutes)
>> >> > Trying to kill expired process, pid 11396
>> >> > LockLog(Lock expired, process killed)
>> >> > SetLock(lock..neptunium.execd.execd_1243)
>> >> > PutLock(lock..neptunium.execd.execd_1243)
>> >> > cfpopen(/var/cfengine/bin/cfagent -Q
>> >> > smtpserver,sysadm,fqhost,ipaddress,EmailMaxLines,EmailFrom,EmailTo -D 
>> >> > from_cfexecd)
>> >> > ReleaseCurrentLock(lock..neptunium.execd.execd_1243)
>> >> > PutLock(last..neptunium.execd.execd_1243)
>> >> > LockLog(Lock removed normally )
>> >> > 
>> >> > 
>> >> > 
>> >> > ~Adam
>> >> > 
>> >> > 
>> >> > On Mon, 22 Nov 2004 Mark.Burgess@iu.hio.no wrote:
>> >> > 
>> >> >> 
>> >> >> Hmmm - can you try running 
>> >> >> 
>> >> >> cfexecd -d2 for me?
>> >> >> 
>> >> >> M
>> >> >> 
>> >> >> On 22 Nov, Adam M. Dunn wrote:
>> >> >> > 
>> >> >> > This is all I see in the gdb 'back':
>> >> >> > 
>> >> >> > (gdb) back
>> >> >> > #0  0xff359768 in ?? ()
>> >> >> > #1  0xff357e18 in ?? ()
>> >> >> > #2  0xff3696cc in ?? ()
>> >> >> > #3  0x000299a0 in ScheduleRun () at cfexecd.c:538
>> >> >> > #4  0x000290dc in StartServer (argc=1078272, argv=0x107400) at 
>> >> >> > cfexecd.c:324
>> >> >> > #5  0x00028a60 in main (argc=1, argv=0xffbefa44) at cfexecd.c:124
>> >> >> > 
>> >> >> > 
>> >> >> > 
>> >> >> > ~Adam
>> >> >> > 
>> >> >> > 
>> >> >> > 
>> >> >> > On Mon, 22 Nov 2004 Mark.Burgess@iu.hio.no wrote:
>> >> >> > 
>> >> >> >> 
>> >> >> >> Please do the following for which ever daemon is crashing:
>> >> >> >> (e.g. try strings on the core first)
>> >> >> >> 
>> >> >> >> gdb /path/to/dameon /path/core
>> >> >> >> 
>> >> >> >> Then inside gdb type "back" for a backtrace and send the result
>> >> >> >> 
>> >> >> >> Mark
>> >> >> >> On 22 Nov, Adam M. Dunn wrote:
>> >> >> >> > 
>> >> >> >> > AHHH, Mark, sorry.  They ARE dumping core files.  I was looking 
>> >> >> >> > in the
>> >> >> >> > wrong place.  I was checking my current working directory, but it 
>> >> >> >> > seems
>> >> >> >> > the core is dumping to the .../inputs directory and I wasn't 
>> >> >> >> > looking
>> >> >> >> > there.
>> >> >> >> > 
>> >> >> >> > If I send this to you will this help any?
>> >> >> >> > 
>> >> >> >> > 
>> >> >> >> > ~adam
>> >> >> >> > 
>> >> >> >> > 
>> >> >> >> > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> >> >> >> > Adam Dunn
>> >> >> >> > Systems Administrator II
>> >> >> >> > Human Genome Sequencing Center
>> >> >> >> > Baylor College of Medicine
>> >> >> >> > N1419 One Baylor Plaza
>> >> >> >> > Houston, TX 77030
>> >> >> >> > 
>> >> >> >> > Voice: 713.798.3124
>> >> >> >> > Fax  : 713.798.6977
>> >> >> >> > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> >> >> >> > 
>> >> >> >> > 
>> >> >> >> > On Mon, 22 Nov 2004, Adam M. Dunn wrote:
>> >> >> >> > 
>> >> >> >> >> 
>> >> >> >> >> Nope, that's the problem.  I have not been able to find a 
>> >> >> >> >> reason.  It's 
>> >> >> >> >> as if they're exiting cleanly.  I'm not finding any core files 
>> >> >> >> >> or anything
>> >> >> >> >> out of the ordinary.  The only thing I've seen is as soon as the 
>> >> >> >> >> daemons
>> >> >> >> >> fork off that cfagent process the daemons die about 5 seconds 
>> >> >> >> >> later.  The
>> >> >> >> >> runlogs show:
>> >> >> >> >> 
>> >> >> >> >> Lock expired, process killed:pid=205:cfenvd:daemon
>> >> >> >> >> 
>> >> >> >> >> Lock expired, process killed:pid=253:cfexecd:execd
>> >> >> >> >> 
>> >> >> >> >> Those log entries correspond to the kill times, however, those 
>> >> >> >> >> PID's
>> >> >> >> >> weren't the ones the daemons were running as which confused me a 
>> >> >> >> >> bit.
>> >> >> >> >> And I bascially just start these up by running:
>> >> >> >> >> 
>> >> >> >> >> /var/cfengine/bin/cfexecd
>> >> >> >> >> /var/cfengine/bin/cfenvd -H
>> >> >> >> >> /var/cfengine/bin/cfservd
>> >> >> >> >> 
>> >> >> >> >> 
>> >> >> >> >> My first thought was maybe I have a policy that's bad and 
>> >> >> >> >> killing them,
>> >> >> >> >> so I also trying running them with no policies in place with the 
>> >> >> >> >> same
>> >> >> >> >> results.  If there's any other information you'd like me to 
>> >> >> >> >> check I'll
>> >> >> >> >> post that.
>> >> >> >> >> 
>> >> >> >> >> 
>> >> >> >> >> thanks,
>> >> >> >> >> Adam
>> >> >> >> >> 
>> >> >> >> >> 
>> >> >> >> >> On Mon, 22 Nov 2004 Mark.Burgess@iu.hio.no wrote:
>> >> >> >> >> 
>> >> >> >> >> > 
>> >> >> >> >> > 
>> >> >> >> >> > Do they dump core? Can you give us more info about the reason?
>> >> >> >> >> > 
>> >> >> >> >> > M
>> >> >> >> >> > 
>> >> >> >> >> > On 22 Nov, Adam M. Dunn wrote:
>> >> >> >> >> > > 
>> >> >> >> >> > > I'm having a troubling problem with cfengine under Solaris 
>> >> >> >> >> > > 8/9.  The
>> >> >> >> >> > > cfexecd, and cfenvd keep dying soon after starting (cfservd 
>> >> >> >> >> > > has no
>> >> >> >> >> > > problem).  I'm running the latest version, and also tried the
>> >> >> >> >> > > previous.  Shortly before they die I've noticed the folling 
>> >> >> >> >> > > process
>> >> >> >> >> > > fire off I presume by cfexecd:
>> >> >> >> >> > > cfagent -Q 
>> >> >> >> >> > > smtpserver,sysadm,fqhost,ipaddress,EmaiolMaxLines,E...
>> >> >> >> >> > > 
>> >> >> >> >> > > Also, sometimes cfenvd doesn't die at the same time, but 
>> >> >> >> >> > > eventually they
>> >> >> >> >> > > both die.
>> >> >> >> >> > > 
>> >> >> >> >> > > This is a big problem to my deployment since I want to run 
>> >> >> >> >> > > cfexecd in
>> >> >> >> >> > > daemon mode.  Everything runs fine under Linux even with the 
>> >> >> >> >> > > same or no
>> >> >> >> >> > > policies.  I also tried using a policy that does a restart 
>> >> >> >> >> > > of the
>> >> >> >> >> > > daemons as described in the cfengine manuals, but it doesn't 
>> >> >> >> >> > > help.  Can
>> >> >> >> >> > > anyone help!!!
>> >> >> >> >> > > 
>> >> >> >> >> > > 
>> >> >> >> >> > > ~adam
>> >> >> >> >> > > 
>> >> >> >> >> > > 
>> >> >> >> >> > > 
>> >> >> >> >> > > 
>> >> >> >> >> > > _______________________________________________
>> >> >> >> >> > > Help-cfengine mailing list
>> >> >> >> >> > > Help-cfengine@gnu.org
>> >> >> >> >> > > http://lists.gnu.org/mailman/listinfo/help-cfengine
>> >> >> >> >> > 
>> >> >> >> >> > 
>> >> >> >> >> > 
>> >> >> >> >> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> >> >> >> >> > Work: +47 22453272            Email:  Mark.Burgess@iu.hio.no
>> >> >> >> >> > Fax : +47 22453205            WWW  :  
>> >> >> >> >> > http://www.iu.hio.no/~mark
>> >> >> >> >> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> >> >> >> >> > 
>> >> >> >> >> > 
>> >> >> >> >> > 
>> >> >> >> >> 
>> >> >> >> >> 
>> >> >> >> >> 
>> >> >> >> >> _______________________________________________
>> >> >> >> >> Help-cfengine mailing list
>> >> >> >> >> Help-cfengine@gnu.org
>> >> >> >> >> http://lists.gnu.org/mailman/listinfo/help-cfengine
>> >> >> >> >> 
>> >> >> >> >> 
>> >> >> >> 
>> >> >> >> 
>> >> >> >> 
>> >> >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> >> >> >> Work: +47 22453272            Email:  Mark.Burgess@iu.hio.no
>> >> >> >> Fax : +47 22453205            WWW  :  http://www.iu.hio.no/~mark
>> >> >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> >> >> >> 
>> >> >> >> 
>> >> >> 
>> >> >> 
>> >> >> 
>> >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> >> >> Work: +47 22453272            Email:  Mark.Burgess@iu.hio.no
>> >> >> Fax : +47 22453205            WWW  :  http://www.iu.hio.no/~mark
>> >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> >> >> 
>> >> >> 
>> >> 
>> >> 
>> >> 
>> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> >> Work: +47 22453272            Email:  Mark.Burgess@iu.hio.no
>> >> Fax : +47 22453205            WWW  :  http://www.iu.hio.no/~mark
>> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> >> 
>> >> 
>> 
>> 
>> 
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Work: +47 22453272            Email:  Mark.Burgess@iu.hio.no
>> Fax : +47 22453205            WWW  :  http://www.iu.hio.no/~mark
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 
>> 



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Work: +47 22453272            Email:  Mark.Burgess@iu.hio.no
Fax : +47 22453205            WWW  :  http://www.iu.hio.no/~mark
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~






reply via email to

[Prev in Thread] Current Thread [Next in Thread]