help-cfengine
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Bogus file content on copies


From: Ferguson, Steve
Subject: RE: Bogus file content on copies
Date: Thu, 17 Jul 2003 12:11:23 -0400

I'm seeing these log entries from cfservd on my server machine, which seem
to correspond with the file copying problems.

Jul 17 12:05:19 bigbox cfservd[4069]: [ID 702911 daemon.notice] Host
authorization/authentication failed or access denied
Jul 17 12:05:19 bigbox cfservd[4069]: [ID 702911 daemon.notice] From
(host=client.my.domain.com,user=root,ip=::ffff:xx.yy.zz.48)

Is there any reason a host would be authorized one minute, then rejected in
the next minute?  DNS lookups seem to be consistent.  We don't have any
round-robin hosts in the batch and I have yet to see a lookup fail.  nscd is
running and caching most lookups anyway.

Steve

> -----Original Message-----
> From: Ferguson, Steve 
> Sent: Thursday, July 17, 2003 10:34 AM
> To: 'help-cfengine@gnu.org'
> Subject: RE: Bogus file content on copies
> 
> 
> Another related concern: I read that the default Timeout for 
> network connections is 10 seconds.  Yet, as I see this 
> problem occurring, it gradually eats up all the child 
> processes I have configured in cfrun.hosts and ends up 
> blocking the entire cfrun from completing.  Some of the 
> cfrun-triggered cfagent processes on the clients have stayed 
> around for up to 10 minutes before I've killed them by hand.
> 
> Either nothing is occurring to terminate the network 
> connection, the cfagent network connection is dying and 
> cfagent itself is hanging internally, or something is 
> blocking so that the cfagent can't die (though a standard 
> SIGTERM works by hand).
> 
> Steve
> 
> > -----Original Message-----
> > From: Ferguson, Steve 
> > Sent: Thursday, July 17, 2003 9:29 AM
> > To: 'Mark.Burgess@iu.hio.no'; circlist@wwoc.org
> > Cc: Ferguson, Steve; help-cfengine@gnu.org
> > Subject: RE: Bogus file content on copies
> > 
> > 
> > This morning I came in and ran cfrun with no arguments, to 
> > hit all of the servers (over 130).  I had 45 hangs before I 
> > gave up and interrupted cfrun, all with this same file left 
> > in /var/cfengine/inputs/cfagent.conf.cfnew.  As a first step, 
> > I ran truss on the hung cfagent process on several of the 
> > boxes.  They were all hung here:
> > 
> > 13945:  recv(5, 0x00108528, 1397, 0)    (sleeping...)
> > 13945:  signotifywait()                 (sleeping...)
> > 13945:  door_return(0x00000000, 0, 0x00000000, 0) (sleeping...)
> > 13945:  lwp_cond_wait(0xFEED5548, 0xFEED5558, 0xFEECEDB0) 
> > (sleeping...)
> > 
> > I don't know if that's of any help.
> > 
> > After running cfrun a second time, it went through every 
> > machine cleanly.  I've been able to do a clean cfrun several 
> > times since then this morning.  I'm going to leave it alone 
> > for an hour and try again, to see if there's some sort of 
> > "first time in" condition that's causing a problem.  I'm 
> > starting to suspect an issue with the central server rather 
> > than any of the individual clients.
> > 
> > Steve
> > 
> > > -----Original Message-----
> > > From: Mark.Burgess@iu.hio.no [mailto:Mark.Burgess@iu.hio.no]
> > > Sent: Thursday, July 17, 2003 3:59 AM
> > > To: circlist@wwoc.org
> > > Cc: Mark.Burgess@iu.hio.no; Steve.Ferguson@gedas.com;
> > > help-cfengine@gnu.org
> > > Subject: Re: Bogus file content on copies
> > > 
> > > 
> > > 
> > > I don't even understand how this *could happen*, so any
> > > details you can find out would be useful,
> > > 
> > > thanks
> > > M
> > > 
> > > On 16 Jul, Jeremy 'Circ' Charles wrote:
> > > > On Wed, 2003-07-16 at 12:13, Mark.Burgess@iu.hio.no wrote:
> > > >> I have not seen this for many years: Might be something 
> > to do with
> > > >> threading libraries. PLease try to reproduce this running it in
> > > >> a debugger. Only a stack error could cause something like this.
> > > > 
> > > > I'd be curious to know what platform Steve encountered this 
> > > problem on.
> > > > 
> > > > I have cfagent doing a TON of work as part of my RedHat 9 
> > > installation
> > > > procedure and in maintaining the machines thereafter.  
> > > Yesterday I had
> > > > one cfagent run hang after dropping content just like what 
> > > Steve pointed
> > > > out in a file:
> > > > 
> > > >> root@myhost:inputs# more cfservd.conf
> > > >> t 2048BAD: Host authentication failed. Did you forget the 
> > > domain name?
> > > > 
> > > > In my case, it was a different file, but my recollection of 
> > > the bogus
> > > > content is just like the above.
> > > > 
> > > > It has only happened once that I'm aware of.  I wrote it 
> > > off as a fluke,
> > > > blew away the goobered file on the target machine and 
> > > started over.  All
> > > > was well after that.  :-)
> > > > 
> > > 
> > > 
> > > 
> > > 
> > 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > Work: +47 22453272            Email:  Mark.Burgess@iu.hio.no
> > > Fax : +47 22453205            WWW  :  http://www.iu.hio.no/~mark
> > > 
> > 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]