help-cfengine
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: cfengine failover on copy does not seem to work as documented


From: Chip Seraphine
Subject: RE: cfengine failover on copy does not seem to work as documented
Date: Mon, 7 Nov 2005 14:26:10 -0000 (GMT)
User-agent: WebMail/1.4.1

> Failover, in my world, is generally used to talk about network services
> and that is the meaning it carries in cfengine today.

Mine too,  :)  In that sense I am not using the mechanism for what it was
intended for.

> I can agree that
> there is a general need for some kind of exception handling for other
> cases too. Then what you are really asking for is another kind of class
>
> filenotfound=myclass
>
> to complement failover=

Something like that, yes.

We probably don't want to get into a situation where we have numerous
error-handling warts growing on commands, such as "filenotfound=foo" and
"unreadable=bar".  If we are going to revisit this, I'd consider allowing
the user to define an error "base class" (which is essentially a label for
that particular copy operation) and a specific error token could be
appended.  (So the base class would indicate a failure, and an extended
class would give specifics errors from a defined list.)  Something like
this:


#Overly simplified unrealistic code ensues!
copy:

 any::
  /foo/bar  dest=/bar/foo server=$(policyhost) label=barcopy

 barcopy_serverdown::
  /alt/cfenginepath/foo/bar  dest=/bar/foo server=$(backuphost)
label=barcopy2


shellcommands:

 barcopy_filenotfound::
   "/bin/echo Check $(policyhost) to make sure /foo/bar is present | mail
admingroup"

 barcopy_filenotreadble::
   "/bin/echo Permissions bad at /foo/bar on $(policyhost) | mail
cfenginemaintainer"

#barcopy_err is defined on ANY failure of barcopy, in addition to specific
classes
 barcopy_failed.barcopy2_failed::
   "/bin/echo Warning, /bar/foo is out of date on $(host) |
/usr/local/sbin/nagioswrapper"


> This can be patched into the current cfengine without any real
> difficulty. And it should be better designed in cfengine 3.

If we have labels, we can also set dynamic variables (${barcopy_errmsg})....

> Is it not true that, if the file is unreadable, you get an error message
> anyway?

Of course.   And that is fine for me, the cfengine maintainer, since the
cfexecd output goes into my inbox.  But given a few hundred machines, that
is a *lot* of noise (I have cfservd and cfexecd crashes-and-restarts every
few minutes, each of which generates mail) and nobody but the cfengine guy
wants to see that.

So, I have a 'complaint' tool that allows me to shoot messages to the NMS,
to specific admins or developers who "own" certain resources, etc. 
(Example:  The presence of a sendmail corefile in /var/core makes me want
to shoot an email to the mail guy, not to the whole sysadmin team.)

Also, I often simply do not have time to read all the normal output-- it
is very cluttered with 'routine' messages (SIGPIPEs from trying to copy
from downed machines, stuck cfagents being killed, cfexecd or cfservd
segfaulting), and it is useful to have an external mechanism tell me
when-and-where something really needs my attention, so that I know when to
dive into those outputs.

> Mark

~Chip




reply via email to

[Prev in Thread] Current Thread [Next in Thread]