help-cfengine
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Too many cfagents running. Was: Load problem with cfservd


From: Baker, Darryl
Subject: RE: Too many cfagents running. Was: Load problem with cfservd
Date: Wed, 16 Mar 2005 09:14:28 -0500

 
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Follow-up:
        What I found is cfexecd is spawning cfagents every 5 minutes during
the scheduled quarter hour. So in Q1 it spawns one at 0,5,10 and in
Q3 it spawns one at 30,35,40. Therefore I get and increased load by a
factor of 3 on the server rather than reducing the load as I was
trying to do.

_____________________________________________________________________
Darryl Baker
gedas USA, Inc.
Operational Services Business Unit
3800 Hamlin Road
Auburn Hills, MI 48326
US
phone   +1-248-754-5341
fax     +1-248-754-6399
Darryl.Baker@gedas.com
http://www.gedasusa.com
_____________________________________________________________________

> -----Original Message-----
> From: help-cfengine-bounces+darryl.baker=gedas.com@gnu.org
> [mailto:help-cfengine-bounces+darryl.baker=gedas.com@gnu.org]On
> Behalf Of Baker, Darryl
> Sent: Tuesday, March 15, 2005 12:31 PM
> To: help-cfengine@gnu.org
> Subject: Too many cfagents running. Was: Load problem with cfservd
> 
> 
>  
> 
> *** PGP Signature Status: good
> *** Signer: Darryl Philip Baker <darryl.baker@gedas.com>
> *** Signed: 3/15/2005 12:31:14 PM
> *** Verified: 3/16/2005 9:10:01 AM
> *** BEGIN PGP VERIFIED MESSAGE ***
> 
>  
> *** PGP Signature Status: good
> *** Signer: Darryl Philip Baker <darryl.baker@gedas.com>
> *** Signed: 3/15/2005 12:28:32 PM
> *** Verified: 3/15/2005 12:30:08 PM
> *** BEGIN PGP VERIFIED MESSAGE ***
> 
> Installing the latest snapshot has reduced the problem with system
> loading on the master. 
> 
> Now I'm finding that cfexecd is starting one cfagent every 5
> minutes even though I have the schedule set to only run in Q1 and
> Q4."schedule = ( Q1 Q3 )" Why?
> 
> 
> 
> ____________________________________________________________________
> _ Darryl Baker
> gedas USA, Inc.
> Operational Services Business Unit
> 3800 Hamlin Road
> Auburn Hills, MI 48326
> US
> phone +1-248-754-5341
> fax   +1-248-754-6399
> Darryl.Baker@gedas.com
> http://www.gedasusa.com
> ____________________________________________________________________
> _  
> 
> > -----Original Message-----
> > From: help-cfengine-bounces+darryl.baker=gedas.com@gnu.org
> > [mailto:help-cfengine-bounces+darryl.baker=gedas.com@gnu.org]On
> > Behalf Of Baker, Darryl
> > Sent: Monday, March 14, 2005 4:08 PM
> > To: help-cfengine@gnu.org
> > Subject: Load problem with cfservd
> > 
> > 
> > 
> > *** PGP Signature Status: good
> > *** Signer: Darryl Philip Baker <darryl.baker@gedas.com>
> > *** Signed: 3/14/2005 4:08:02 PM
> > *** Verified: 3/15/2005 10:54:48 AM
> > *** BEGIN PGP VERIFIED MESSAGE ***
> > 
> > My master machine is Solaris 9 and all systems are running
> > Solaris 8 or 9 and cfengine 2.1.13.
> > 
> > The problem we have with cfservd manifests itself as a periodic
> > clog that takes about a minute to resolve. This period is
> > characterized by the following symptoms:
> > 
> > 1. Load average spike from ~3 (on a 4-processor system) to the
> > 6-8 range. Occasionally the spike breaks into double digits. 
> > 2. Increase in concurrent  port 5308 (cfengine) sessions from a
> > base level of 0-4 to peaks in the 12-30 range, with the number of
> > LWP's in the cfservd processes tracking the number of connections
> > linearly. (Client systems are set to connect twice an hour with a
> > 25-minute
> > 'splay time.)
> > 3. Running lockstat shows severe contention for a single adaptive
> > mutex:
> > 
> > root@sysadm05:proc# lockstat sleep 5
> > 
> > Adaptive mutex spin: 157416 events in 5.040 seconds (31233
> > events/sec)
> > Count indv cuml rcnt     spin Lock                   Caller      
> >   
> >   
> >        
> > ------------------------------------------------------------------
> > -- -- ---------
> > 136805  87%  87% 1.00       75 0x152ec90             
> > sfmmu_mlist_enter+0x84        
> > [...] 
> > Adaptive mutex block: 648 events in 5.040 seconds (129
> > events/sec) Count indv cuml rcnt     nsec Lock                  
> > Caller         
> >   
> >        
> > ------------------------------------------------------------------
> > -- -- ---------
> >   547  84%  84% 1.00   391652 0x152ec90             
> > sfmmu_mlist_enter+0x84  
> > 
> > Both of those types of lock run about 2 orders of magnitude lower
> > in total, with the specific lock running as much as 3 orders of
> > magnitude lower, (i.e. ~100 spins and no blocks)  when the system
> > is in its 'calm' state. 
> > 
> > 4. The cfservd process becomes by far the top cpu user, eating
> > 10-25% of total cpu on a 4-processor system. 
> > 5. The system retains some idle time (5-30%) but the time used by
> > the kernel jumps to the 40-70% range. 
> > 
> > The history of troubleshooting this leads me to believe that the
> > heavy ssh usage on this host is a significant compounding factor,
> > i.e. that we are hitting some common bottleneck when we have
> > cfservd accepting connections and are spawning batches of 30-100
> > outbound ssh connections at once. Reducing the herds of outbound
> > ssh's has reduced the frequency and severity of these clog
> > periods, but every time we change much of anything on the system,
> > we end up getting back to a state where these clogs become
> > common. 
> > 
> > 
> > 
> > __________________________________________________________________
> > __ _ Darryl Baker
> > gedas USA, Inc.
> > Operational Services Business Unit
> > 3800 Hamlin Road
> > Auburn Hills, MI 48326
> > US
> > phone       +1-248-754-5341
> > fax +1-248-754-6399
> > Darryl.Baker@gedas.com
> > http://www.gedasusa.com
> > __________________________________________________________________
> > __ _  
> > 
> > 
> > 
> > 
> > *** END PGP VERIFIED MESSAGE ***
> > 
> > 
> > 
> 
> 
> *** END PGP VERIFIED MESSAGE ***
>  
> 
> 
> *** END PGP VERIFIED MESSAGE ***
>  
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: PGP Personal Security 7.0.3

iQA/AwUBQjg/SFe1Bhkj9lZeEQLLDwCfQoESiAjjH1RvS/SwZjGX98sRrXYAn1pQ
6FsWz4K8yitp5+l/Pi8JuPl1
=QylC
-----END PGP SIGNATURE-----
 

Attachment: Baker, Darryl.vcf
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]