[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Large install and roll-out

From: John Valdes
Subject: Re: Large install and roll-out
Date: Mon, 11 Mar 2002 20:17:17 -0600
User-agent: Mutt/1.2.5i

On Mon, Mar 11, 2002 at 09:06:37AM +0100, Rune Mossige wrote:
> In about 4-6 weeks, we will install a 128-node dual-CPU Linux cluster,
> running RedHat Linux 7.2, and I'd like to install cfengine on all those
> hosts at the same time....

As the other replies suggest, there's more than one way to do this... :)

We do software installation on all our hosts (be they standalone or
members of a cluster) via cfengine.  All the software to be installed
is packaged into a common format for all our OSes (we're currently
using RPM for the package format) and placed on an internal anonymous
ftp server.  All the packages are then tracked in a (SQL) database
which lists what software (pkg name & version) should be installed on
what hosts.  To do the actual installation, we then use a custom
cfengine module which, when run via cfengine, will query the database to
get a list of all the software that should be installed, compare it to
what is currently installed, and then download and install or update
(or remove) any packages as needed.  The module will then define
classes for any software which is installed, and these in turn can
trigger actions (eg, for editfiles:, processes:, shellcommands:, etc)
in the cfengine script to finalize or otherwise configure the software
installation.  For a little more detailed description of the process,

Given that background, then, I'll say that we install cfengine on all
our hosts using cfengine.  Of course, that's a bit of a chicken-or-egg
problem, as one needs to have a copy of cfengine installed in order to
install cfengine... To get the process going, we install the initial
copy (rpm) of cfengine via a postinstall script to the RedHat
installation process.  We install all our RedHat systems via network
kickstart, so in the %post section of the kickstart profile, we create
a boot time script which will be run on the initial boot after the
kickstart completes.  This boot script then simply uses "rpm -i" to
install the cfengine rpm from our anonymous ftp server, using an ftp
URL as the name of the cfengine rpm to be installed (eg, "rpm -i
ftp://server/cfengine.rpm";).  It then runs the freshly installed
cfengine to install the rest of the software and configure the system
as instructed by the actions in the cfengine script(s).  Finally the
boot script deletes itself and reboots the system (cfengine also
installs patches (ie, updated RedHat rpms in the case of RedHat), so
given that a large number of these get installed at OS install time,
we do a final reboot for good measure).

> One think that I noticed, is that the cfengine install docs advises to
> first install cfenvd, and let it run for a week, before cfkey is run to
> generate the keys....this is not going to work for us, as we need to
> start using cfengine asap.

As others noted, Linux has a /dev/random, so this is enough for cfkey
to get the randomness it needs.  We haven't updated our cfengine rpm
for 2.0.0 yet (we currently still run 1.6.3 in production), but when
we do, we'll probably have the rpm run cfkey on initial install in
order to generate the keypair for the client.  We would then either
run cfagent with "trustkey=true" to grab the server's public key, or
else simply distribute the server's public key in our cfengine rpm.

> Also, after this initial rollout, we will need to update the kernels on
> all hosts very early after the initial bootstrap. How do others update a
> large number of RedHat 7.2 machines with a new kernel?  Is it OK to
> compile on one host, and then just tar up the /lib/modules and /boot
> directories, and re-run lilo on each host after untarring?

In theory, yes, that should be fine, esp. since all your nodes will be
identical (I assume).  You could distribute the new kernel via
cfengine's copy: action, and then have that define a class which
triggers an editfiles: action to update /etc/lilo.conf (or
/boot/grub/grub.conf if you're using grub), which in turn defines
another class to trigger a shellcommands: which reruns lilo, which
finally, if desired, defines yet another class to trigger another
shellcommand: to schedule a reboot (eg, via "shutdown -r" with some
time specified or maybe using "reboot" with the "at" command).

In our case, we use a scheme like the software one described above for
installing a new kernel as a "patch".  That is, we have a cfengine
patch module which takes care of installing updated RedHat rpms on our
RedHat hosts.  We would simply have the patch module install either
one of RedHat's kernel update rpms or one of our own custom kernel
rpms.  Through a chain of class definitions, we can then have cfengine
take care of updating lilo.conf or grub.conf (note that with grub in
RH 7.2 at least, RedHat's kernel rpms automatically update grub.conf
as well as regenerate an "initrd" if needed), rerun lilo & schedule a
reboot, if desired.  At least in theory; we're not doing this in
production yet as we need to update our module to take into account
the different kernel "architectures" we're running (eg, some of our
systems are running RedHat's "i686" kernel, others the "athlon"
kernel, yet others the "smp" variety, etc).  We do do kernel patching
and occasional reboots under Solaris however w/o any problems.  I
don't anticipate any with Linux.

> Any pointers to additional info, or example cfengine config files and
> helper scripts would be appreciated.

The paper referred to above has a URL with a copy of our cfengine
modules.  Unfortunately, the information and examples there are a bit
out of date and incomplete; we should hopefully be posting more
up-to-date info and examples RSN.  In the meantime, hopefully the
above will give you some ideas on one possible way to manage your
cluster with cfengine.


John Valdes                        Department of Astronomy & Astrophysics                                 University of Chicago

reply via email to

[Prev in Thread] Current Thread [Next in Thread]