help-cfengine
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Load problem with cfservd


From: Mark Burgess
Subject: Re: Load problem with cfservd
Date: Mon, 14 Mar 2005 22:29:34 +0100
User-agent: Internet Messaging Program (IMP) 3.2.2

Can you try the latest snapshot.

M

Quoting "Baker, Darryl" <Darryl.Baker@gedas.com>:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> My master machine is Solaris 9 and all systems are running Solaris 8
> or 9 and cfengine 2.1.13.
> 
> The problem we have with cfservd manifests itself as a periodic clog
> that takes about a minute to resolve. This period is characterized by
> the following symptoms:
> 
> 1. Load average spike from ~3 (on a 4-processor system) to the 6-8
> range. Occasionally the spike breaks into double digits. 
> 2. Increase in concurrent  port 5308 (cfengine) sessions from a base
> level of 0-4 to peaks in the 12-30 range, with the number of LWP's in
> the cfservd processes tracking the number of connections linearly.
> (Client systems are set to connect twice an hour with a 25-minute
> 'splay time.)
> 3. Running lockstat shows severe contention for a single adaptive
> mutex:
> 
> root@sysadm05:proc# lockstat sleep 5
> 
> Adaptive mutex spin: 157416 events in 5.040 seconds (31233
> events/sec)
> Count indv cuml rcnt     spin Lock                   Caller          
>        
> - ----------------------------------------------------------------------
> - ---------
> 136805  87%  87% 1.00       75 0x152ec90             
> sfmmu_mlist_enter+0x84        
> [...] 
> Adaptive mutex block: 648 events in 5.040 seconds (129 events/sec)
> Count indv cuml rcnt     nsec Lock                   Caller          
>        
> - ----------------------------------------------------------------------
> - ---------
>   547  84%  84% 1.00   391652 0x152ec90             
> sfmmu_mlist_enter+0x84  
> 
> Both of those types of lock run about 2 orders of magnitude lower in
> total, with the specific lock running as much as 3 orders of
> magnitude lower, (i.e. ~100 spins and no blocks)  when the system is
> in its 'calm' state. 
> 
> 4. The cfservd process becomes by far the top cpu user, eating 10-25%
> of total cpu on a 4-processor system. 
> 5. The system retains some idle time (5-30%) but the time used by the
> kernel jumps to the 40-70% range. 
> 
> The history of troubleshooting this leads me to believe that the
> heavy ssh usage on this host is a significant compounding factor,
> i.e. that we are hitting some common bottleneck when we have cfservd
> accepting connections and are spawning batches of 30-100 outbound ssh
> connections at once. Reducing the herds of outbound ssh's has reduced
> the frequency and severity of these clog periods, but every time we
> change much of anything on the system, we end up getting back to a
> state where these clogs become common. 
> 
> 
> 
> _____________________________________________________________________
> Darryl Baker
> gedas USA, Inc.
> Operational Services Business Unit
> 3800 Hamlin Road
> Auburn Hills, MI 48326
> US
> phone +1-248-754-5341
> fax   +1-248-754-6399
> Darryl.Baker@gedas.com
> http://www.gedasusa.com
> _____________________________________________________________________
> 
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Version: PGP Personal Security 7.0.3
> 
> iQA/AwUBQjX9Mle1Bhkj9lZeEQLTgQCeNHbP4+Zf+P2luqNx/QRNpLeOYF8AnRvL
> BXCjcj0Rs4JDtgcQzjKv016V
> =IHlF
> -----END PGP SIGNATURE-----
> 
> 
> 


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Work: +47 22453272            Email:  Mark.Burgess@iu.hio.no
Fax : +47 22453205            WWW  :  http://www.iu.hio.no/~mark
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]