[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[monit-dev] "Execution failed"
From: |
Brian Candler |
Subject: |
[monit-dev] "Execution failed" |
Date: |
Tue, 27 May 2008 09:28:07 +0100 |
User-agent: |
Mutt/1.5.11 |
I'm running monit 5.0_beta1, and just thought I'd report an anomoly I've now
seen several times.
Sometimes when processes start the Status is reported as "Execution failed"
(in red in the web interface), and just sticks like that. However the
process is running just fine, and its pid file is there. It's also in
status "monitored".
Here's an example from one system right now:
# monit status
...
Process 'radsub_subscriber'
status Execution failed
monitoring status monitored
pid 3608
parent pid 1
uptime 1d 12h 48m
childrens 0
memory kilobytes 18376
memory kilobytes total 18376
memory percent 1.7%
memory percent total 1.7%
cpu percent 2.6%
cpu percent total 2.6%
data collected Tue May 27 09:17:36 2008
...
# ps auxwww | grep radsub | grep -v grep
root 3608 2.7 1.7 22508 18376 ? S May25 59:43 /usr/bin/ruby
bin/radsub.rb /u/apps/radsub/shared/log/radacct
# cat /etc/monit.d/radsub.monitrc
check process radsub_subscriber
with pidfile /u/apps/radsub/shared/pids/subscriber.pid
start program = "/bin/sh -c 'echo $$ >
/u/apps/radsub/shared/pids/subscriber.pid;
cd /u/apps/radsub/current;
exec /usr/bin/ruby bin/radsub.rb
/u/apps/radsub/shared/log/radacct 2>>/u/apps/radsub/shared/log/radsub.log'"
stop program = "/bin/sh -c 'kill `cat
/u/apps/radsub/shared/pids/subscriber.pid`'"
if totalmem is greater than 30.0 MB for 4 cycles then restart
if totalcpu is greater than 30% for 4 cycles then restart
if 10 restarts within 10 cycles then timeout
group radsub
# cat /u/apps/radsub/shared/pids/subscriber.pid
3608
On a different system, it's apache which is in this state:
# monit status
...
Process 'apache'
status Execution failed
monitoring status monitored
pid 2784
parent pid 1
uptime 2d 5h 19m
childrens 8
memory kilobytes 4184
memory kilobytes total 35252
memory percent 0.4%
memory percent total 3.4%
cpu percent 0.0%
cpu percent total 0.0%
port response time 0.052s to localhost:443 [HTTP via TCPSSL]
port response time 0.002s to localhost:80 [HTTP via TCP]
data collected Tue May 27 09:22:09 2008
File 'httpd.conf'
status accessible
monitoring status monitored
permission 644
uid 0
gid 0
timestamp Mon Apr 28 14:53:22 2008
size 34742 B
checksum 71ef1c79f56dfcf96a02497b7bc3590c(MD5)
data collected Tue May 27 09:22:09 2008
Directory 'httpd.conf.d'
status accessible
monitoring status monitored
permission 755
uid 0
gid 0
timestamp Fri May 9 15:14:04 2008
data collected Tue May 27 09:22:09 2008
...
# ps auxwww | grep 2784 | grep -v grep
root 2784 0.0 0.4 9452 4184 ? Ss May14 0:11 /usr/sbin/httpd
# cat /etc/monit.d/apache.monitrc
check process apache
with pidfile "/var/run/httpd.pid"
start program = "/etc/init.d/httpd start"
stop program = "/etc/init.d/httpd stop"
if 2 restarts within 3 cycles then timeout
if totalmem > 150 Mb then alert
if children > 255 for 5 cycles then stop
if totalcpu usage > 95% for 3 cycles then restart
if failed port 80 protocol http then restart
if failed port 443 type TCPSSL proto http then restart
group server
depends on httpd.conf, httpd.conf.d
check file httpd.conf
with path /etc/httpd/conf/httpd.conf
# Reload apache if the httpd.conf file was changed
if changed checksum
then exec "/etc/init.d/httpd graceful"
check directory httpd.conf.d
with path /etc/httpd/conf.d
if changed timestamp
then exec "/etc/init.d/httpd graceful"
# cat /var/run/httpd.pid
2784
However the first system is also running apache, with an identical monit
configuration. On that system, apache's status is "running", as I'd expect.
Therefore this is an intermittent problem, only getting stuck in this state
occasionally.
Has this issue been observed before? If not, is there anything I can do to
help track it down?
Thanks,
Brian.
- [monit-dev] "Execution failed",
Brian Candler <=