[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[monit 3.2 release plan]
From: |
Jan-Henrik Haukeland |
Subject: |
[monit 3.2 release plan] |
Date: |
10 Feb 2003 18:11:00 +0100 |
User-agent: |
Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Civil Service) |
To summarize the latest discussion, then, here's the list with
remaining tasks before we can make a monit 3.2 release.
1. Reload monit on sighup only ++ as recently discussed on this list
Responsible: Jan-Henrik
2. Fix race conditions issues with monit restart
Responsible: Martin
3. Solve sslv2/sslv3/tsl detection problem as reported by Mark F.
Responsible: Christian
4. In a "restart" situation the resource data "ProcInfo_T" is not
correctly reinitialized (zeroed)... e.g. -6324% CPU load
Responsible: Christian
5. Add Oliver Jehle's doc. for how to use monit with Heartbeat.
Responsible: Jan-Henrik
I have enclosed Oliver's doc. below if you want to check it out
first. (It's a patch for monit.pod -- we read patch files like others
read newspapers, do we not?)
--- monit.pod Sat Jan 11 03:14:01 2003
+++ /home/oj/monit.pod Tue Jan 28 08:09:07 2003
@@ -285,21 +285,143 @@ clusters. For instance, using the I<hear
(http://linux-ha.org/) to watch the health of nodes and in the
case of one machine failure start services on a secondary node.
-Appropriate scripts that can call monit to start/stop specific
-services are needed on both nodes - typical usage:
-
- FILE DESCRIPTION
- -----------------------------------
- /etc/inittab starts monit
- /etc/rcS.d/S41heartbeat execute "monit start heartbeat"
- /etc/init.d/monit-node1 execute "monit -g node1 start"
- /etc/init.d/monit-node2 execute "monit -g node2 start"
-
-This way hearbeat can easily control the cluster state and if one
-node fails, hearbeat will start monit-xxxxx on the running node
-and monit is instructed to start the services of the failing node
-and monitor them...
+=head2 Monit with heartbeat
+The first thing you have to do is install and configure
+I<heartbeat> (http://www.linux-ha.org/downloads) .
+The Getting Started Guide is very usefull for this task
+(http://www.linux-ha.org/download/GettingStarted.html).
+
+B<Starting up a Node>
+
+This is the normal start sequence for a cluster-node.
+With this sequence, there should be no error-case, which is not
+handled either by heartbeat or monit. For example, if monit
+dies, initd restart it. If heatbeat dies, monit restart it. If
+the node dies, heartbeat on the other node dedect it and restart
+the services there.
+
+ 1) initd starts monit with group local
+ 2) monit starts heartbeat in local group
+ 3) heartbeat requests monit to start the node group
+ 4) monit starts the node group
+
+B<Monit F</etc/monitrc>>
+
+This sample describe a cluster with 2 nodes.
+Services running on Node 1 are in group I<node1>, Node 2 services
+are in I<node2>.
+
+The local group entries are mode I<active>, the node group
+entries are mode I<manual> and controlled by heartbeat
+
+ #
+ # local services on every host
+ #
+ #
+ check heartbeat with pidfile /var/run/heartbeat.pid
+ start program = "/etc/init.d/heartbeat start"
+ stop program = "/etc/init.d/heartbeat start"
+ mode active
+ alert address@hidden
+ group local
+ #
+ #
+ check postfix with pidfile /var/spool/postfix/pid/master.pid
+ start program = "/etc/init.d/postfix start"
+ stop program = "/etc/init.d/postfix stop"
+ mode active
+ alert address@hidden
+ group local
+ #
+ # node1 services
+ #
+ check apache with pidfile /var/apache/logs/httpd.pid
+ start program = "/etc/init.d/apache start"
+ stop program = "/etc/init.d/apache stop"
+ depends named
+ alert address@hidden
+ mode manual
+ group node1
+ #
+ #
+ check named with pidfile /var/tmp/named.pid
+ start program = "/etc/init.d/named start"
+ stop program = "/etc/init.d/named stop"
+ alert address@hidden
+ mode manual
+ group node1
+ #
+ # node2 services
+ #
+ check named-slave with pidfile /var/tmp/named-slave.pid
+ start program = "/etc/init.d/named-slave start"
+ stop program = "/etc/init.d/named-slave stop"
+ mode manual
+ alert address@hidden
+ group node2
+ #
+ #
+ check squid with pidfile /var/squid/logs/squid.pid
+ start program = "/etc/init.d/squid start"
+ stop program = "/etc/init.d/squid stop"
+ depends named-slave
+ alert address@hidden
+ mode manual
+ group node2
+
+B<initd F</etc/inittab>>
+
+Monit is started on both nodes with initd. You have to add a
+entry in F</etc/inittab> starting monit with the local group,
+where heartbeat is member of.
+
+ #/etc/inittab
+ mo:2345:respawn:/usr/local/bin/monit -i -d 10 -c /etc/monitrc -g local
+
+B<heartbeat F</etc/ha.d/haresources>>
+
+When heartbeat starts, heartbeat lookup for the node entry and
+start the script F</etc/init.d/monit-node1> or
+F</etc/init.d/monit-node2>. The script calls monit
+to start the node specific group.
+
+ # /etc/ha.d/haresources
+ node1 IPaddr::172.16.100.1 monit-node1
+ node2 IPaddr::172.16.100.2 monit-node2
+
+
+B<F</etc/init.d/monit-node1>>
+
+ #!/bin/bash
+ #
+ # sample script for starting/stopping all services for node1
+ #
+ prog="/usr/local/bin/monit -g node1"
+ start()
+ {
+ echo -n $"Starting $prog:"
+ $prog start
+ echo
+ }
+
+ stop()
+ {
+ echo -n $"Stopping $prog:"
+ $prog stop>
+ echo
+ }
+
+ case "$1" in
+ start)
+ start;;
+ stop)
+ stop;;
+ *)
+ echo $"Usage: $0 {start|stop}"
+ RETVAL=1
+ esac
+ exit $RETVAL
=head1 ALERT MESSAGES
--
Jan-Henrik Haukeland
- [monit 3.2 release plan],
Jan-Henrik Haukeland <=