[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[monit-dev] Monit incorrect restart behaviour
From: |
Valentin Avram |
Subject: |
[monit-dev] Monit incorrect restart behaviour |
Date: |
Tue, 28 Dec 2010 14:59:54 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101213 Lightning/1.0b3pre Thunderbird/3.1.7 |
Hello.
I have encountered an incorrect behaviour of monit while using the
restart command for a monitored service.
The problem is that after executing the stop command for the service,
monit does not wait for the stop command to complete before running the
start command. Instead, it monitors the pidfile and as soon as it
disappeared, monit calls the start command. This is a wrong behaviour
since the stop command might do something else beside stopping the service.
Timeframe of events happening:
- User issues the restart command for service X
- monit receives the restart command
- monit calls the stop script
- (stop script running, pidfile exists) monit verifies the pidfile
- (stop script running, pidfile is gone) monit sees the pidfile is gone
- (stop script running) monit calls the start script
- (stop script running) the start scripts detects the stop script is
running and refuses to comply its command
- (stop script running) monit keeps waiting for the pidfile (according
to "with timeout" setting)
- stop script finishes
- monit keeps waiting for the pidfile
- timeout is reached, monit gives up on start script
- time passes according to daemon check interval
- monit detects it's time to check for the service
- the service is down, monit calls the start script
- start script is running, pidfile is created
- monit is happy
The server is running gentoo, I have tried with monit 5.1.1 (marked as
stable) as well as monit 5.2.2 (marked as testing).
The correct behaviour, in my opinion, is that monit should:
- either wait for the stop command to finish and check if it has
returned code 0 (all ok), and only after this run the start script -
REASON: maybe the stop command does something else besides stopping the
process and it's unsafe to presume the process should be started
regardless of the success of the stop command.
- or monit should accept a keyword in the monitrc config file that
forces monit to wait and check the return code for the stop command (in
case it's not specified, default behaviour should be assumed).
Example of configuration:
- /etc/monitrc:
check process X
with pidfile "/var/run/X/X_eth1.pid"
start program = "/etc/init.d/X start" with timeout 60 seconds
stop program = "/etc/init.d/X stop" with timeout 60 seconds
if 4 restarts within 5 cycles then timeout
if totalmem > 250 Mb then alert
if children > 255 for 5 cycles then stop
if cpu usage > 95% for 3 cycles then restart
group X
- monit log in verbose mode:
2010-12-28T14:08:49.926555+02:00 myserver monit[13661]: restart service
'X' on user request
2010-12-28T14:08:49.926555+02:00 myserver monit[13661]: monit daemon at
13661 awakened
2010-12-28T14:08:49.926555+02:00 myserver monit[13661]: Awakened by User
defined signal 1
2010-12-28T14:08:49.926555+02:00 myserver monit[13661]: monit: Cannot
open proc file /proc/15007/stat -- No such file or directory
2010-12-28T14:08:49.926555+02:00 myserver monit[13661]: system statistic
error -- cannot read /proc/15007/stat
2010-12-28T14:08:49.936555+02:00 myserver monit[13661]: 'X' trying to
restart
2010-12-28T14:08:49.936555+02:00 myserver monit[13661]: Monitoring
disabled -- service X
2010-12-28T14:08:49.936555+02:00 myserver monit[13661]: 'X' stop:
/etc/init.d/X
2010-12-28T14:08:50.946555+02:00 myserver monit[13661]: monit: pidfile
'/var/run/X/X_eth1.pid' does not exist
2010-12-28T14:08:50.946555+02:00 myserver monit[13661]: monit: pidfile
'/var/run/X/X_eth1.pid' does not exist
2010-12-28T14:08:50.946555+02:00 myserver monit[13661]: monit: pidfile
'/var/run/X/X_eth1.pid' does not exist
2010-12-28T14:08:50.946555+02:00 myserver monit[13661]: 'X' start:
/etc/init.d/X
2010-12-28T14:08:50.946555+02:00 myserver monit[13661]: monit: pidfile
'/var/run/X/X_eth1.pid' does not exist
[skip repeated lines]
2010-12-28T14:09:50.536555+02:00 myserver monit[13661]: monit: pidfile
'/var/run/X/X_eth1.pid' does not exist
2010-12-28T14:09:50.536555+02:00 myserver monit[13661]: 'X' failed to start
2010-12-28T14:09:50.536555+02:00 myserver monit[13661]:
-------------------------------------------------------------------------------
2010-12-28T14:09:50.536555+02:00 myserver monit[13661]:
/usr/bin/monit [0x8056216]
2010-12-28T14:09:50.536555+02:00 myserver monit[13661]:
-------------------------------------------------------------------------------
2010-12-28T14:09:50.536555+02:00 myserver monit[13661]: Execution failed
notification is sent to [EDITED_EMAIL_ADDRESS]
2010-12-28T14:09:50.586555+02:00 myserver monit[13661]: Monitoring
enabled -- service X
2010-12-28T14:09:50.586555+02:00 myserver monit[13661]: 'X' restart
action done
2010-12-28T14:09:50.586555+02:00 myserver monit[13661]: Action done
notification is sent to [EDITED_EMAIL_ADDRESS]
2010-12-28T14:09:50.626555+02:00 myserver monit[13661]: 'X' check
skipped -- service already handled in a dependency chain
2010-12-28T14:10:50.636555+02:00 myserver monit[13661]: monit: pidfile
'/var/run/X/X_eth1.pid' does not exist
2010-12-28T14:10:50.636555+02:00 myserver monit[13661]: 'X' process is
not running
2010-12-28T14:10:50.636555+02:00 myserver monit[13661]:
-------------------------------------------------------------------------------
2010-12-28T14:10:50.636555+02:00 myserver monit[13661]:
/usr/bin/monit [0x8056216]
2010-12-28T14:10:50.636555+02:00 myserver monit[13661]:
-------------------------------------------------------------------------------
2010-12-28T14:10:50.636555+02:00 myserver monit[13661]: Does not exist
notification is sent to [EDITED_EMAIL_ADDRESS]
2010-12-28T14:10:50.676555+02:00 myserver monit[13661]: 'X' trying to
restart
2010-12-28T14:10:50.676555+02:00 myserver monit[13661]: Monitoring
disabled -- service X
2010-12-28T14:10:50.676555+02:00 myserver monit[13661]: monit: pidfile
'/var/run/X/X_eth1.pid' does not exist
2010-12-28T14:10:50.676555+02:00 myserver monit[13661]: monit: pidfile
'/var/run/X/X_eth1.pid' does not exist
2010-12-28T14:10:50.676555+02:00 myserver monit[13661]: 'X' start:
/etc/init.d/X
2010-12-28T14:10:50.676555+02:00 myserver monit[13661]: monit: pidfile
'/var/run/X/X_eth1.pid' does not exist
[skip repeated lines]
2010-12-28T14:11:04.816555+02:00 myserver monit[13661]: monit: pidfile
'/var/run/X/X_eth1.pid' does not exist
2010-12-28T14:11:05.826555+02:00 myserver monit[13661]: 'X' started
2010-12-28T14:11:05.826555+02:00 myserver monit[13661]: Execution
succeeded notification is sent to [EDITED_EMAIL_ADDRESS]
2010-12-28T14:11:05.866555+02:00 myserver monit[13661]: Monitoring
enabled -- service X
2010-12-28T14:12:05.876555+02:00 myserver monit[13661]: 'X' process is
running with pid 15609
2010-12-28T14:12:05.876555+02:00 myserver monit[13661]: Exists
notification is sent [EDITED_EMAIL_ADDRESS]
Thank you for your time.
Valentin Avram
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [monit-dev] Monit incorrect restart behaviour,
Valentin Avram <=