[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: cfengine failover on copy does not seem to work as documented
From: |
Mark Burgess |
Subject: |
RE: cfengine failover on copy does not seem to work as documented |
Date: |
Mon, 07 Nov 2005 16:52:06 +0100 |
You should think about more about how this could be optimized.
What I envisage in cfengine 3 is that you will be able to attach
a kind of "filter" like stanza to a command to provide exception
handling. We can talk about this at LISA. So the question is what
kinds of things should be in those stanzas - and you have already given
some ideas.
M
On Mon, 2005-11-07 at 14:26 +0000, Chip Seraphine wrote:
> > Failover, in my world, is generally used to talk about network services
> > and that is the meaning it carries in cfengine today.
>
> Mine too, :) In that sense I am not using the mechanism for what it was
> intended for.
>
> > I can agree that
> > there is a general need for some kind of exception handling for other
> > cases too. Then what you are really asking for is another kind of class
> >
> > filenotfound=myclass
> >
> > to complement failover=
>
> Something like that, yes.
>
> We probably don't want to get into a situation where we have numerous
> error-handling warts growing on commands, such as "filenotfound=foo" and
> "unreadable=bar". If we are going to revisit this, I'd consider allowing
> the user to define an error "base class" (which is essentially a label for
> that particular copy operation) and a specific error token could be
> appended. (So the base class would indicate a failure, and an extended
> class would give specifics errors from a defined list.) Something like
> this:
>
>
> #Overly simplified unrealistic code ensues!
> copy:
>
> any::
> /foo/bar dest=/bar/foo server=$(policyhost) label=barcopy
>
> barcopy_serverdown::
> /alt/cfenginepath/foo/bar dest=/bar/foo server=$(backuphost)
> label=barcopy2
>
>
> shellcommands:
>
> barcopy_filenotfound::
> "/bin/echo Check $(policyhost) to make sure /foo/bar is present | mail
> admingroup"
>
> barcopy_filenotreadble::
> "/bin/echo Permissions bad at /foo/bar on $(policyhost) | mail
> cfenginemaintainer"
>
> #barcopy_err is defined on ANY failure of barcopy, in addition to specific
> classes
> barcopy_failed.barcopy2_failed::
> "/bin/echo Warning, /bar/foo is out of date on $(host) |
> /usr/local/sbin/nagioswrapper"
>
>
> > This can be patched into the current cfengine without any real
> > difficulty. And it should be better designed in cfengine 3.
>
> If we have labels, we can also set dynamic variables (${barcopy_errmsg})....
>
> > Is it not true that, if the file is unreadable, you get an error message
> > anyway?
>
> Of course. And that is fine for me, the cfengine maintainer, since the
> cfexecd output goes into my inbox. But given a few hundred machines, that
> is a *lot* of noise (I have cfservd and cfexecd crashes-and-restarts every
> few minutes, each of which generates mail) and nobody but the cfengine guy
> wants to see that.
>
> So, I have a 'complaint' tool that allows me to shoot messages to the NMS,
> to specific admins or developers who "own" certain resources, etc.
> (Example: The presence of a sendmail corefile in /var/core makes me want
> to shoot an email to the mail guy, not to the whole sysadmin team.)
>
> Also, I often simply do not have time to read all the normal output-- it
> is very cluttered with 'routine' messages (SIGPIPEs from trying to copy
> from downed machines, stuck cfagents being killed, cfexecd or cfservd
> segfaulting), and it is useful to have an external mechanism tell me
> when-and-where something really needs my attention, so that I know when to
> dive into those outputs.
>
> > Mark
>
> ~Chip