[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bootstrapping

From: John Sechrest
Subject: Re: Bootstrapping
Date: Wed, 18 Feb 2004 15:33:50 -0800

"Luke A. Kanies" <address@hidden> writes:

 % On Wed, 18 Feb 2004, Eric Sorenson wrote:

 % > Cfengine has the same problem, except when the host key changes
 % > you have to track down why this one machine can't get updates and
 % > the users are complaining.

 % This is another problem that I consider unsolved.  How do you know all of
 % your hosts are correctly updating themselves?  How do you even define
 % 'correctly'?

 The only thing that I have come up with is to have a token that 
 gets stuffed into a file, or into ldap which marks the last successful
 run of cfengine. Since things might not get done, It is not clear
 to me that this does anything but tell you that it is not erroring
 out before the end. 

 % At my previous client I was reading all syslog messages from a pipe
 % written to by syslog-ng, and then storing those logs in a database.  I
 % tacked a small filter on that reader and had it start storing last-seen
 % records in LDAP for every host (with some throttling so I didn't spam the
 % LDAP server).  Then I defined 'recent' for my various services (cfengine
 % and ISconf, in that case), and had a script which could easily check
 % whether all of my hosts were 'recent'.  I never went so far as to connect
 % it to a tool like Nagios, but I would have liked to.

 Yes, this is a good direction to look. 

 % This was a pretty good method in that it used my master host list to tell
 % me the status of every host in the list.  However, it had a serious
 % failing:  It didn't have a good definition of correct.  Of course, it was
 % also subject to failures of the syslog system (syslog-ng dies, the reading
 % script dies, etc.), but that was solvable through other methods.

 % So, as to 'correctly updating':  If a client can successfully copy
 % _anything_ is it working?  What about if it's just running cfagent at all?
 % What if it has some errors, such as being incapable of starting a process?

 % I don't believe it's possible to have cfagent collect the number of
 % errors, or to classify a portion of an update as 'critical' or 'optional',
 % but that would certainly be useful.  If I could collect that information
 % and then use it to have the client update my LDAP repository as the last
 % stage in any run, then I would believe I had a good definition of a
 % functional system.  Just a simple (No errors/Some Minor Errors/Some
 % Critical Errors/Total nonfunction/No Data) stat of some kind would be very
 % useful.

 Yes, I agree this would be valuable. I would like to have Nagios
 or Big brother alert when my cfengine activities are not working
 as I expect. 

 And I would like to derive these from my main configuration.


        1) Here is what I want for my site: (my mln config file goes here)
        2) which then generates the site
        3) Which then generates the cfengine and nagios configs
        4) which then starts cfengine running
        5) which changes a token
        6) which nagios watches

 % I'm working on it....

   Very nice. 

 % Luke
 % -- 
 % Health is merely the slowest possible rate at which one can die.
 % _______________________________________________
 % Help-cfengine mailing list
 % address@hidden

John Sechrest          .         Helping people use
                        .           computers and the Internet
                          .            more effectively
                                 .       Internet: address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]