[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: convergence and undoing changes
Re: convergence and undoing changes
Tue, 22 Nov 2005 07:19:55 -0500
This is an excellent practical description of the problem. I've been
using cfengine for a few months now and am completely convinced this
(i.e. convergence) is the way sysadmin should be done. It just 'feels'
right :). I'm really fascinated with the whole computer immunology
concept and systems being able to have some intelligence to fix
themselves rather than wake me at 5am. I really think this the way of
I had decided to commit to cfengine for a network I was setting up to
see how things worked out. Normally I would describe my procedure for
configuring the systems (i.e. on a wiki or some documentation site) so
that I could setup new systems the same way, or have new sysadmins
understand it. One of the cool side effects of using cfengine was my
configuration became self-documenting, and if someone wanted to
understand how a system was setup, they only need to look at the
cfengine config. Not a replacement for documentation, but still very
Also when I had to setup a new system in a similar config, all I had
to do was install cfengine and let the system converge. It's really
great to actually see this happen and work properly, showing practical
vs. theoretical. I was worried it wouldn't work due to my inexperience
with cfengine, but surprisingly it worked great.
So from this discussion, it seems the best method to me is to design
undos in such a way that they would work on systems that both need the
change and don't. For example, using copy: instead of editfiles: for
something like the sshd_config below, or editfiles: with zapping the
On 11/21/05, Moore, Joe <address@hidden> wrote:
> Alva Couch wrote:
> > My experience is that users are all too cavalier about the way they
> > modify cfagent.conf. I think a specific discipline -- unknown to many
> > users -- is the key. We can either document that discipline or
> > encapsulate it in some kind of transaction engine. I propose
> > to do both.
> IMO, more documentation (preferably in the cfengine space) about what
> convergence is (and isn't) and how to think convergently is needed.
> > My examples using editfiles are a matter of public record. But the
> > problem can even happen when one utilizes purely convergent actions.
> > Here's a "typical" example of user thinking.
> > - user asserts contents of a file F. Say it is a service startup
> > in /etc/xinetd.d and the intent is to customize some service.
> So at this point, all systems converge to F -- for example
> /etc/ssh/sshd_config: PermitRootLogin without-password
> > - then, some time after F is stable, the user changes the assertion
> > to revert F to its original state.
> At this point, the user changes the convergence goal to F' (which may be
> identical to F.orig, or it might have some other set of properties.
> /etc/ssh/sshd_config: #PermitRootLogin no
> > - unbeknownst to the user, some different set of stations are down
> > while F is reverting to the original state.
> Some systems do not immediately converge. The network is in some
> indeterminate mixture of unmanaged, F, and F'.
> > - then, satisfied that the file is reverted, the user takes the
> > reversion assertion out of the script, considering work to be done.
> At this point, the user mistakenly decides that "unmanaged" is the
> correct goal state (rather than F or F').
> > - time passes and the unreverted machines come back up. There is
> > no reversion to affect them. So they stay with the new version.
> After time passes without management, configuration drift occurs.
> > - At this point, there are two classes of machines: those with
> > the original version of F and those with the new version. If the
> > new version has a security hole, congratulations, you didn't manage
> > to plug it.
> Configuration drift of unmanaged states results in inderminate mixtures
> of configuration state.
> Some systems will allow root logins (without-password) and some won't.
> If a local administrator edits sshd_config and "PermitRootLogin yes",
> that's a third class of machine.
> > The key here is that for reversions to be effective, they must stay
> > in the configuration until it is absolutely sure that all stations
> > have applied them. In a very large network, one is likely never
> > sure, so one can *never* remove the reversions from the config file.
> The key here is that if you have a goal state, you must define that
> > This is the principle of observability:
> > Once one manages a thing, one must continue to manage that
> > thing in perpetuity.
> > In my experience this kind of "reversion botch" is very common.
> Actually, it's "Once one realizes that a thing needs management, one
> must continue to manage that thing until configuration drift is
> This kind of "reversion botch" is based on the mistaken assumption that
> the original configuration was magically convergent.