|
From: | EzCom Keith |
Subject: | Re: Cannot get Monit to run more than 60 seconds |
Date: | Thu, 16 Dec 2010 09:44:19 -0800 |
Hi,I had a similar problem (may be not exactly the same) on Atmel AT91SAM9260-EK with 32 bit ARM running embedded Linux.In my case it was a floating point exception killing monit silently.As far as I discovered my monit ran for about 1 minute until the first CPU statistic calculation (process/sysdep_LINUX.c)After several attempts I got it running properly.I just changed types of all unsigned long long appear in this file to unsigned long and all '%llu' to '%lu' (the latter - to make gcc happy).Try to take a look at your syslog files, in my case it looks like follows (/var/log/messages)Dec 16 12:47:49 192 user.debug kernel: NWFPE: monit[943] takes exception 00000001 at c080b8d0 from 401eb3d8First time I fixed (hacked) it for about 2 year ago, from then I have upgraded several times to the latest monit's version, each time repeating the fix. Currently my board is equipped with:monit -VThis is Monit version 5.2.2Copyright (C) 2000-2010 by Tildeslash Ltd. All Rights Reserved.Everything works just perfect (many thanks to monit's team).On Thu, Dec 16, 2010 at 9:57 AM, EzCom Keith <address@hidden> wrote:Hi everyone..
I've about reached the end of my road here trying to get Monit to run, and at this point,
I'm simply going 'uncle' and posting for help. I have Googled, I have read documentation,
I have studied examples, all to no avail so far. The app runs for the specified 60 second
'wait' period in my monitrc, then goes away. No matter what I've tried, it's the exact
same result.Let me begin by saying I followed this guide here:
http://www.howtoforge.com/server-monitoring-with-munin-and-monit-on-centos-5.2-p2I went through the setup for a 64 bit box with CentOS 5 Final. Every step matched what was
documented to the 'T'. After doing the SSL certs, the website said "finally, we can start
Monit: /etc/init.d/monit start", which I did. It complained my mysqld wasn't in the right
path, nor my postfix. I just commented those entries out to come back to them later, and
restarted the daemon. It seemed to grab, as a ps aux | grep monit showed it running, and
/etc/init.d/monit status confirmed it. I opened a browser and pointed it to my box with
the proper port, but got nothing. Went back to the running processes and found Monit dead.Going through the monit.log, I saw there was an id error, because the folder expected to
hold the id wasn't there. I created it, re-ran the daemon, and this time it reported that
it wrote a unique id file to the directory I created, and it was once again running. 60
seconds later, it was dead again. The monit.log revealed nothing out of the ordinary, here
is what a cycle of start -> dead looks like in the log:[EST Dec 15 14:11:26] info : monit: generated unique Monit id 99655fc9cc168e531b8d9734cab746b9 and stored to '/var/monit/id'
[EST Dec 15 14:11:26] info : Starting monit daemon with http interface at [*:2812]
[EST Dec 15 14:11:26] info : Monit start delay set -- pause for 60s
[EST Dec 15 14:12:26] info : Starting monit HTTP server at [*:2812]I then started running the daemon in the foreground with noise, and frankly, if the problem
is revealed in there, I don't see it. Here's that:$/usr/bin/monit -d 10 -c /etc/monit.d/monitrc -v -l /var/log/monit.log
monit: Debug: Adding net allow '{my_home_ip_here}'.
monit: Debug: Adding credentials for user 'admin'.
Runtime constants:
Control file = /etc/monit.d/monitrc
Log file = /var/log/monit.log
Pid file = /var/run/monit.pid
Debug = True
Log = True
Use syslog = False
Is Daemon = True
Use process engine = True
Poll time = 10 seconds with start delay 0 seconds
Expect buffer = 256 bytes
Mail from = (not defined)
Mail subject = (not defined)
Mail message = (not defined)
Start monit httpd = True
httpd bind address = Any/All
httpd portnumber = 2812
httpd signature = True
Use ssl encryption = True
PEM key/cert file = /var/certs/monit.pem
Client cert file = None
Allow self certs = False
httpd auth. style = Basic Authentication and Host/Net allow listThe service list contains the following entries:
Process Name = proftpd
Pid file = /var/run/proftpd.pid
Monitoring mode = active
Start program = '/etc/init.d/proftpd start' timeout 30 second(s)
Stop program = '/etc/init.d/proftpd stop' timeout 30 second(s)
Existence = if does not exist 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
Pid = if changed 1 times within 1 cycle(s) then alert
Ppid = if changed 1 times within 1 cycle(s) then alert
Port = if failed localhost:21 [FTP via TCP] with timeout 5 seconds 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
Timeout = If restarted 5 times within 5 cycle(s) then unmonitorProcess Name = sshd
Pid file = /var/run/sshd.pid
Monitoring mode = active
Start program = '/etc/init.d/sshd start' timeout 30 second(s)
Stop program = '/etc/init.d/sshd stop' timeout 30 second(s)
Existence = if does not exist 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
Pid = if changed 1 times within 1 cycle(s) then alert
Ppid = if changed 1 times within 1 cycle(s) then alert
Port = if failed localhost:22 [SSH via TCP] with timeout 5 seconds 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
Timeout = If restarted 5 times within 5 cycle(s) then unmonitorProcess Name = apache
Group = www
Pid file = /var/run/httpd.pid
Monitoring mode = active
Start program = '/etc/init.d/httpd start' timeout 30 second(s)
Stop program = '/etc/init.d/httpd stop' timeout 30 second(s)
Existence = if does not exist 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
Pid = if changed 1 times within 1 cycle(s) then alert
Ppid = if changed 1 times within 1 cycle(s) then alert
Port = if failed www.ezcommunities.com:80/monit/token [HTTP via TCP] with timeout 5 seconds 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
Load avg. (5min) = if greater than 10.0 8 times within 8 cycle(s) then stop else if succeeded 1 times within 1 cycle(s) then alert
Children = if greater than 250 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
CPU usage limit = if greater than 80.0% 5 times within 5 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
CPU usage limit = if greater than 60.0% 2 times within 2 cycle(s) then alert else if succeeded 1 times within 1 cycle(s) then alert
Timeout = If restarted 3 times within 5 cycle(s) then unmonitorSystem Name = system_{myexample.site.com}
Monitoring mode = active-------------------------------------------------------------------------------
Starting monit daemon with http interface at [*:2812]monit.log says:
[EST Dec 16 02:27:04] info : Starting monit daemon with http interface at [*:2812]
[EST Dec 16 02:27:04] info : Starting monit HTTP server at [*:2812]
[EST Dec 16 02:27:04] info : monit HTTP server started
[EST Dec 16 02:27:04] info : 'system_{myexample.site.com}' Monit started/etc/init.d/monit status says:
monit dead but pid file existsFor completeness, here is monitrc:
set daemon 60 with start delay 60
set logfile /var/log/monit.log
# set mailserver localhost
# set mail-format { from: address@hidden} }
# set alert address@hidden
set httpd port 2812 and
SSL ENABLE
PEMFILE /var/certs/monit.pem
allow {my_home_ip_here}
allow admin:testcheck process proftpd with pidfile /var/run/proftpd.pid
start program = "/etc/init.d/proftpd start"
stop program = "/etc/init.d/proftpd stop"
if failed port 21 protocol ftp then restart
if 5 restarts within 5 cycles then timeoutcheck process sshd with pidfile /var/run/sshd.pid
start program "/etc/init.d/sshd start"
stop program "/etc/init.d/sshd stop"
if failed port 22 protocol ssh then restart
if 5 restarts within 5 cycles then timeout# check process mysql with pidfile /var/run/mysqld/mysqld.pid
# group database
# start program = "/usr/sbin/mysqld start"
# stop program = "/usr/sbin/mysqld stop"
# if failed host 127.0.0.1 port 3306 then restart
# if 5 restarts within 5 cycles then timeoutcheck process apache with pidfile /var/run/httpd.pid
group www
start program = "/etc/init.d/httpd start"
stop program = "/etc/init.d/httpd stop"
if failed host {myexample.site.com} port 80 protocol http
and request "/monit/token" then restart
if cpu is greater than 60% for 2 cycles then alert
if cpu > 80% for 5 cycles then restart
# if totalmem > 500 MB for 5 cycles then restart
if children > 250 then restart
if loadavg(5min) greater than 10 for 8 cycles then stop
if 3 restarts within 5 cycles then timeout# check process postfix with pidfile /var/spool/postfix/pid/master.pid
# group mail
# start program = "/etc/init.d/postfix start"
# stop program = "/etc/init.d/postfix stop"
# if failed port 25 protocol smtp then restart
# if 5 restarts within 5 cycles then timeoutAs stated, I'm at a dead-end. I have no idea what to try next, as I've tried everything that
I could see from a variety of other trouble posts, but always end up with a dead service
after 60 seconds.Help appreciated. = )
- Keith
--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general
[Prev in Thread] | Current Thread | [Next in Thread] |