[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: cfservd under load - db crash, MaxConnections

From: Mark . Burgess
Subject: Re: cfservd under load - db crash, MaxConnections
Date: Wed, 8 Dec 2004 08:55:25 +0100 (MET)

Just a note -- I have been and heard of people experiencing crashes
due to berkeley libraries that are incorrectly compiled. There are
so many versions now that it is easy to mix up header files and
libraries, leading to core dumps.

I hope to be able to look into some recent bug reports in the next
few days. The end of semester (exams) is rather demanding at the


On  7 Dec, Eric Sorenson wrote:
> As I was working on the SplayTime curiosity described in my last post,
> I was also investigating a couple of things on the server side.  The
> first was derived from a couple of coredumps I got that both looked like 
> this, with different 'mipaddr' values:
> (gdb) bt
> #0  0x40057822 in ?? ()
> #1  0x4004c9c2 in ?? ()
> #2  0x4006f819 in ?? ()
> #3  0x40069659 in ?? ()
> #4  0x08050877 in IsWildKnownHost (oldkey=0x81f21b8, newkey=0x8344d50, 
> mipaddr=0x83423b8 "",
>      username=0x8341fb4 "root") at cfservd.c:3154
> #5  0x08050452 in CheckStoreKey (conn=0x8341b98, key=0x8344d50) at 
> cfservd.c:3038
> #6  0x0804eb5a in AuthenticationDialogue (conn=0x8341b98, 
> recvbuffer=0x4cffdb7c "", recvlen=280)
> at cfservd.c:2315
> #7  0x0804d021 in BusyWithConnection (conn=0x8341b98) at cfservd.c:1252
> #8  0x0804c775 in HandleConnection (conn=0x8341b98) at cfservd.c:1133
> #9  0x401d32b6 in ?? ()
> I suspected those innermost frames for which there is no symbol data
> were calls out to the berkeley db libraries, and looking at 
> the code, we were manipulating the /var/cfengine/ppkeys/dynamic key
> database. I suspect that one of my earlier crashes corrupted an entry
> in it, so I just rm'ed it and let it be re-created, and I haven't seen
> any more of these problems -- so this might help others who are seeing cfservd
> crashes and have DynamicAddresses turned on for some hosts.
> Another pathology I saw were 'too many open files' errors from cfservd. At 
> the time our MaxConnections setting in cfservd.conf was 1000, which is
> the maximum allowable value (cfservd.c:392) and clearly, we were hitting some
> wall below that.  So I set it to 100 on a lark and saw this:
> Dec  6 17:25:01 sinistar cfservd[5518]:  Too many threads (>=100) -- increase 
> MaxConnections?
> Dec  6 17:25:02 sinistar last message repeated 64 times
> Dec  6 17:25:02 sinistar cfservd[5518]:  Server seems to be paralyzed. DOS 
> attack? Committing
> apoptosis... Dec  6 17:25:02 sinistar cfservd[5518]:  Received signal 0 
> (NOSIG) while doing
> [cfservd] Dec  6 17:25:02 sinistar cfservd[5518]:  Logical start time Mon Dec 
>  6 17:16:01 2004
> Dec  6 17:25:02 sinistar cfservd[5518]:  This sub-task started really at Mon 
> Dec  6 17:16:01 2004
> 'Apoptosis' is apparently an oncological term meaning 'scheduled cell death'.
> With MaxConnections at 100, cfservd survived long enough to push out
> updated configurations with the increased SplayTime to some clients, easing
> the load off itself. But still I'm seeing an onslaught right at one second
> past the opening of the execution window, and I can't tell if, during the
> time these messages are happening, forward progress is being made by the
> 100 running threads below the apoptosis threshold, or whether there is a
> big set of clients which will not be able to receive their updated configs
> except by package update, because the server will always be slammed when
> they try to connect. I'll try upping the value to 500 to see what happens,
> but I'm wondering if there's a more scientific (or at least, less naive)
> way to tune MaxConnections so that it fits inside OS limits but will handle
> lots of client connections.

Work: +47 22453272            Email:  address@hidden
Fax : +47 22453205            WWW  :

reply via email to

[Prev in Thread] Current Thread [Next in Thread]