[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Solution: Re: monitoring cfagent(s)
From: |
Dmitry Sazonov |
Subject: |
Solution: Re: monitoring cfagent(s) |
Date: |
Mon, 30 Jan 2006 12:24:45 -0500 |
Thanks goes to Matthew for the idea
Included is my implementation.
It will monitor if cfengine runs periodically on the host that is
monitored by Nagios and has cfengine service defined for it.
If the cfagent needs to do any action (there is an output from cfexecd)
the nagios will turn the cfengine service light to Yellow (WARNING).
If a cfagent didn't run for interval of freshness check (1 hr in the
example) (or email is not arriving) Nagios freshness check turns the
cfengine light to orange (UNKNOWN)
Please feel free to comment/improve!
cfengine:
1. add one line in cfengine that executes every time like:
shellcommands:
any:: "/bin/echo host=$(host) date=$(date)"
2. make sure Inform is ON
3. cofigure email address to receive all the email from cfengine :
example: cfengine@nagioshost.domain
Nagios host:
1. create a local account - cfengine (that account needs to be in the
group that owns Nagios cmd file - in my case nagioscmd)
2. (Linux) make sure local sendmail accepts SMTP from remotes (test
email)
3. create ~cfengine/.forward
\cfengine,"|filter.pl"
4. put that filter.pl in /etc/smrsh/ (Linux)
5. filter.pl - something trivial like this:
#!/usr/bin/perl
use Getopt::Std;
getopts('d');
$submit='/usr/local/groundwork/nagios/eventhandlers/submit_check_results';
$/="\n\n";
$_=<>;
($host) = /^From: \w+\@(\w+)/m;
$/="\n";
my $nomatch=0;
while (<>) {
next if /^$/; # skip blank lines
next if /^cfengine:\w+: (Executing|Finished) script \/bin\/echo/;
next if /^cfengine:\w+:\/bin\/echo host=:/;
$nomatch++;
print "no match: ",$_ if $opt_d;
}
if($nomatch){
print "WARNING: $host\n" if $opt_d;
system "$submit $host cfengine 1 \"unexpected output\"";
}else{
print "OK: $host\n" if $opt_d;
system "$submit $host cfengine 0 \"expected output\"";
}
6. write submit_check_results (see Nagios docs on passive checks)
echocmd="/bin/echo"
CommandFile="/usr/local/groundwork/nagios/var/spool/nagios.cmd"
# get the current date/time in seconds since UNIX epoch
datetime=`date +%s`
# create the command line to add to the command file
cmdline="[$datetime] PROCESS_SERVICE_CHECK_RESULT;$1;$2;$3;$4"
# append the command to the end of the command file
$echocmd $cmdline >> $CommandFile
7. Enable Passive checks in Nagios nagios.
accept_passive_service_checks=1
8. create a matching service : cfengine
(Only important lines left in the example:)
define service {
name cfengine
active_checks_enabled 0
passive_checks_enabled 1
check_freshness 1
freshness_threshold 3600
check_command
fake_check!CRITICAL:!freshness check!2!
}
9. Check command need to exist (see Nagios docs about Freshness check)
#!/bin/sh
echo $1 $2
exit $3
--
Dmitry Sazonov
UNIX sysadmin, AAMC
Office: 202-862-6168
>>> Matthew Palmer <mpalmer@hezmatt.org> 1/5/2006 5:24:28 PM >>>
On Thu, Jan 05, 2006 at 04:58:16PM -0500, Dmitry Sazonov wrote:
> Is there any cfengine class that will tell me that all rules were
> processed, but no action were necessary - meaning that the host state
is
> as desired (convergent).
> If "corrective" actions were required - I'd like to know that too,
as
> if the actions are required on every run - there must be something
> wrong.
>
> Based on this class(?) I could fire a syslog message (submit a
passive
> check) to Nagios.
Create a passive check that fires from an e-mail received, and times
out if
no relevant e-mail is received for a period of time, so the check can
go
critical if no e-mail is received (to say "cfagent doesn't appear to
be
running") or it can also go critical (or warning) when an e-mail is
received
that contains evidence of work being done (I tend to use cfagent -I, so
I
would probably check to see if any output was printed in the e-mail).
No cfengine hacking required!
- Matt
--
"For once, Microsoft wasn't exaggerating when they named it the 'Jet
Engine'
-- your data's the seagull."
-- Chris Adams