Re: cfservd thrushes, nodes fail to get anything

From: Dustin Sorge
Subject: Re: cfservd thrushes, nodes fail to get anything
Date: Wed, 25 May 2005 10:31:12 -0400
Luke Youngblood wrote:

It's impossible to predict what will happen when applications run out of
memory and swap due to /tmp filling up.  This is what most people would call
a case of "operator error".  It might be useful to have a cfagent --cleanup
option that would cleanup any temp databases that might be out there, but I
don't think software can realistically be expected to do much of anything
once memory and swap has been exhausted.


Subject: Re: cfservd thrushes, nodes fail to get anything

I cannot see how this can occur. Any ideas from anyone else. There are
no loops. It is possible that this is internal to db.


On Sat, 2005-05-07 at 18:41 -0400, Yaroslav Halchenko wrote:
I've found the reason and probably that would be benefitial to adjust
cfservd to don't get into such situation again:

I had a leftover file /tmp/__db.testDATABASEcache

so strace revealed me infinite loop of

Perhaps it wouldn't be a trivial thing to have a simple perl script that monitors how full /tmp is and then call tmpwatch if it excedes a certain threshold. This of course is assuming that cfengine dosen't have a better way to deal with it.

"/usr/bin/perl" define=filling_up

"/usr/sbin/tmpwatch 2 /tmp" # delete files in /tmp >= 2 hours old ....

   -- Dustin

Dustin Sorge HPC System Administrator Pittsburgh Supercomputing Center Carnegie Mellon University
4400 Fifth Avenue
Pittsburgh, PA 15213

