[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Cfengine daemons keep dying!!!
From: |
Adam M. Dunn |
Subject: |
Re: Cfengine daemons keep dying!!! |
Date: |
Tue, 23 Nov 2004 16:03:37 -0600 (CST) |
Guys, I've found a solution to my problem. Apparently others are having
similar issues.
My Problem: cfengine executables (in my case cfexecd) suddenly seg
faulting (dying) for no apparent reason. Core files do not help, and
crash points seem to be random.
Solution: Try recompiling with a different GCC version if you can. The
one I used that was failing is listed below.
~Adam
---------- Forwarded message ----------
Date: Tue, 23 Nov 2004 22:08:08 +0100 (MET)
From: Mark.Burgess@iu.hio.no
To: adunn@hgsc.bcm.tmc.edu
Subject: Re: Cfengine daemons keep dying!!!
Thank you -- would you post this to the help list? Several people
seem to be having a problem...
M
On 23 Nov, Adam M. Dunn wrote:
>
> Hi Mark,
>
> I still didn't find the root cause of the problem, but I got it to work
> none the less and thought I would fill you in how. It turned out to be
> related to the compiler I was using:
>
>
> (/) neptunium# uname -a
> SunOS neptunium 5.8 Generic_108528-22 sun4u sparc SUNW,Ultra-80
>
> (/) neptunium# gcc -v
> Reading specs from
> /hgsc/gnu/bin/../lib/gcc-lib/sparc-sun-solaris2.8/3.3.1/specs
> Configured with: ../gcc-3.3.1/configure --host=sparc-sun-solaris2.8
> --prefix=/home/share/gnu --exec-prefix=/home/gsc/gnu
> --with-as=/usr/ccs/bin/as --with-ld=/usr/ccs/bin/ld --enable-threads=posix
> --enable-languages=c,c++,f77,java
> Thread model: posix
> gcc version 3.3.1
>
>
> I don't know if it was the compiler itself, or compiler config options. I
> tried recompiling with an old version of 2.95 we had still installed and
> cfexecd stopped seq faulting. I'll probably get the newest GCC installed
> here and recompile with that later.
>
> Thanks for your help though.
>
>
>
> ~Adam
>
>
>
> On Mon, 22 Nov 2004 Mark.Burgess@iu.hio.no wrote:
>
>>
>> Very odd. It doesn't make much sense. Try commenting out the sleep,
>> or putting some printfs into the code to see exactly where it fails
>>
>> M
>>
>> M
>>
>> On 22 Nov, Adam M. Dunn wrote:
>> >
>> > I'm starting to think it seems like `cfexecd' is dying while it sleeps,
>> > that would explain why everything else works except when running as a
>> > daemon.
>> >
>> > # cfexecd -d2
>> > ...
>> > Sleeping...
>> > Segmentation Fault (core dumped)
>> >
>> >
>> >
>> > ~adam
>> >
>> >
>> >
>> > On Mon, 22 Nov 2004 Mark.Burgess@iu.hio.no wrote:
>> >
>> >>
>> >> Vcab you reproduce the problem in this mode? I cannot see where the
>> >> crash is occurring
>> >>
>> >> M
>> >>
>> >> On 22 Nov, Adam M. Dunn wrote:
>> >> >
>> >> > No problem:
>> >> >
>> >> > # cfexec -d2
>> >> > ...
>> >> > ...
>> >> > ...
>> >> > GNU autoconf class from compile time: compiled_on_solaris2_8
>> >> >
>> >> > Address given by nameserver: 128.249.42.234
>> >> > Adding alias neptunium..
>> >> > AddClassToHeap(neptunium)
>> >> > Adding alias neptunium.bcm.tmc.edu..
>> >> > AddClassToHeap(neptunium_bcm_tmc_edu)
>> >> > Appending [neptunium_bcm_tmc_edu]
>> >> > ---------------------------------------------------------------------
>> >> > Starting server
>> >> > ---------------------------------------------------------------------
>> >> >
>> >> > GetLock(cfexecd,execd,time=1101147535), ExpireAfter=0, IfElapsed=0
>> >> > GetLastLock()
>> >> > CheckOldLock(lock..neptunium.execd.execd_1243)
>> >> > Lock lock..neptunium.execd.execd_1243 last ran at Mon Nov 22 12:03:50
>> >> > 2004
>> >> >
>> >> > cfexecd: Lock lock..neptunium.execd.execd_1243 expired...(after 15/0
>> >> > minutes)
>> >> > Trying to kill expired process, pid 11396
>> >> > LockLog(Lock expired, process killed)
>> >> > SetLock(lock..neptunium.execd.execd_1243)
>> >> > PutLock(lock..neptunium.execd.execd_1243)
>> >> > cfpopen(/var/cfengine/bin/cfagent -Q
>> >> > smtpserver,sysadm,fqhost,ipaddress,EmailMaxLines,EmailFrom,EmailTo -D
>> >> > from_cfexecd)
>> >> > ReleaseCurrentLock(lock..neptunium.execd.execd_1243)
>> >> > PutLock(last..neptunium.execd.execd_1243)
>> >> > LockLog(Lock removed normally )
>> >> >
>> >> >
>> >> >
>> >> > ~Adam
>> >> >
>> >> >
>> >> > On Mon, 22 Nov 2004 Mark.Burgess@iu.hio.no wrote:
>> >> >
>> >> >>
>> >> >> Hmmm - can you try running
>> >> >>
>> >> >> cfexecd -d2 for me?
>> >> >>
>> >> >> M
>> >> >>
>> >> >> On 22 Nov, Adam M. Dunn wrote:
>> >> >> >
>> >> >> > This is all I see in the gdb 'back':
>> >> >> >
>> >> >> > (gdb) back
>> >> >> > #0 0xff359768 in ?? ()
>> >> >> > #1 0xff357e18 in ?? ()
>> >> >> > #2 0xff3696cc in ?? ()
>> >> >> > #3 0x000299a0 in ScheduleRun () at cfexecd.c:538
>> >> >> > #4 0x000290dc in StartServer (argc=1078272, argv=0x107400) at
>> >> >> > cfexecd.c:324
>> >> >> > #5 0x00028a60 in main (argc=1, argv=0xffbefa44) at cfexecd.c:124
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > ~Adam
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Mon, 22 Nov 2004 Mark.Burgess@iu.hio.no wrote:
>> >> >> >
>> >> >> >>
>> >> >> >> Please do the following for which ever daemon is crashing:
>> >> >> >> (e.g. try strings on the core first)
>> >> >> >>
>> >> >> >> gdb /path/to/dameon /path/core
>> >> >> >>
>> >> >> >> Then inside gdb type "back" for a backtrace and send the result
>> >> >> >>
>> >> >> >> Mark
>> >> >> >> On 22 Nov, Adam M. Dunn wrote:
>> >> >> >> >
>> >> >> >> > AHHH, Mark, sorry. They ARE dumping core files. I was looking
>> >> >> >> > in the
>> >> >> >> > wrong place. I was checking my current working directory, but it
>> >> >> >> > seems
>> >> >> >> > the core is dumping to the .../inputs directory and I wasn't
>> >> >> >> > looking
>> >> >> >> > there.
>> >> >> >> >
>> >> >> >> > If I send this to you will this help any?
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > ~adam
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> >> >> >> > Adam Dunn
>> >> >> >> > Systems Administrator II
>> >> >> >> > Human Genome Sequencing Center
>> >> >> >> > Baylor College of Medicine
>> >> >> >> > N1419 One Baylor Plaza
>> >> >> >> > Houston, TX 77030
>> >> >> >> >
>> >> >> >> > Voice: 713.798.3124
>> >> >> >> > Fax : 713.798.6977
>> >> >> >> > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Mon, 22 Nov 2004, Adam M. Dunn wrote:
>> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> Nope, that's the problem. I have not been able to find a
>> >> >> >> >> reason. It's
>> >> >> >> >> as if they're exiting cleanly. I'm not finding any core files
>> >> >> >> >> or anything
>> >> >> >> >> out of the ordinary. The only thing I've seen is as soon as the
>> >> >> >> >> daemons
>> >> >> >> >> fork off that cfagent process the daemons die about 5 seconds
>> >> >> >> >> later. The
>> >> >> >> >> runlogs show:
>> >> >> >> >>
>> >> >> >> >> Lock expired, process killed:pid=205:cfenvd:daemon
>> >> >> >> >>
>> >> >> >> >> Lock expired, process killed:pid=253:cfexecd:execd
>> >> >> >> >>
>> >> >> >> >> Those log entries correspond to the kill times, however, those
>> >> >> >> >> PID's
>> >> >> >> >> weren't the ones the daemons were running as which confused me a
>> >> >> >> >> bit.
>> >> >> >> >> And I bascially just start these up by running:
>> >> >> >> >>
>> >> >> >> >> /var/cfengine/bin/cfexecd
>> >> >> >> >> /var/cfengine/bin/cfenvd -H
>> >> >> >> >> /var/cfengine/bin/cfservd
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> My first thought was maybe I have a policy that's bad and
>> >> >> >> >> killing them,
>> >> >> >> >> so I also trying running them with no policies in place with the
>> >> >> >> >> same
>> >> >> >> >> results. If there's any other information you'd like me to
>> >> >> >> >> check I'll
>> >> >> >> >> post that.
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> thanks,
>> >> >> >> >> Adam
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> On Mon, 22 Nov 2004 Mark.Burgess@iu.hio.no wrote:
>> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > Do they dump core? Can you give us more info about the reason?
>> >> >> >> >> >
>> >> >> >> >> > M
>> >> >> >> >> >
>> >> >> >> >> > On 22 Nov, Adam M. Dunn wrote:
>> >> >> >> >> > >
>> >> >> >> >> > > I'm having a troubling problem with cfengine under Solaris
>> >> >> >> >> > > 8/9. The
>> >> >> >> >> > > cfexecd, and cfenvd keep dying soon after starting (cfservd
>> >> >> >> >> > > has no
>> >> >> >> >> > > problem). I'm running the latest version, and also tried the
>> >> >> >> >> > > previous. Shortly before they die I've noticed the folling
>> >> >> >> >> > > process
>> >> >> >> >> > > fire off I presume by cfexecd:
>> >> >> >> >> > > cfagent -Q
>> >> >> >> >> > > smtpserver,sysadm,fqhost,ipaddress,EmaiolMaxLines,E...
>> >> >> >> >> > >
>> >> >> >> >> > > Also, sometimes cfenvd doesn't die at the same time, but
>> >> >> >> >> > > eventually they
>> >> >> >> >> > > both die.
>> >> >> >> >> > >
>> >> >> >> >> > > This is a big problem to my deployment since I want to run
>> >> >> >> >> > > cfexecd in
>> >> >> >> >> > > daemon mode. Everything runs fine under Linux even with the
>> >> >> >> >> > > same or no
>> >> >> >> >> > > policies. I also tried using a policy that does a restart
>> >> >> >> >> > > of the
>> >> >> >> >> > > daemons as described in the cfengine manuals, but it doesn't
>> >> >> >> >> > > help. Can
>> >> >> >> >> > > anyone help!!!
>> >> >> >> >> > >
>> >> >> >> >> > >
>> >> >> >> >> > > ~adam
>> >> >> >> >> > >
>> >> >> >> >> > >
>> >> >> >> >> > >
>> >> >> >> >> > >
>> >> >> >> >> > > _______________________________________________
>> >> >> >> >> > > Help-cfengine mailing list
>> >> >> >> >> > > Help-cfengine@gnu.org
>> >> >> >> >> > > http://lists.gnu.org/mailman/listinfo/help-cfengine
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> >> >> >> >> > Work: +47 22453272 Email: Mark.Burgess@iu.hio.no
>> >> >> >> >> > Fax : +47 22453205 WWW :
>> >> >> >> >> > http://www.iu.hio.no/~mark
>> >> >> >> >> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> _______________________________________________
>> >> >> >> >> Help-cfengine mailing list
>> >> >> >> >> Help-cfengine@gnu.org
>> >> >> >> >> http://lists.gnu.org/mailman/listinfo/help-cfengine
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> >> >> >> Work: +47 22453272 Email: Mark.Burgess@iu.hio.no
>> >> >> >> Fax : +47 22453205 WWW : http://www.iu.hio.no/~mark
>> >> >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> >> >> >>
>> >> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> >> >> Work: +47 22453272 Email: Mark.Burgess@iu.hio.no
>> >> >> Fax : +47 22453205 WWW : http://www.iu.hio.no/~mark
>> >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> >> >>
>> >> >>
>> >>
>> >>
>> >>
>> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> >> Work: +47 22453272 Email: Mark.Burgess@iu.hio.no
>> >> Fax : +47 22453205 WWW : http://www.iu.hio.no/~mark
>> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> >>
>> >>
>>
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Work: +47 22453272 Email: Mark.Burgess@iu.hio.no
>> Fax : +47 22453205 WWW : http://www.iu.hio.no/~mark
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Work: +47 22453272 Email: Mark.Burgess@iu.hio.no
Fax : +47 22453205 WWW : http://www.iu.hio.no/~mark
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~