monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: monit ignoring configuration? a bug?


From: Martin Pala
Subject: Re: monit ignoring configuration? a bug?
Date: Tue, 12 Mar 2013 20:47:26 +0100

Hi,

the described behaviour is normal - the error rate applies only to the error 
state entry, as soon as the service is in error, the check will trigger the 
action each cycle which fails, until the service recovered. It was meant for 
fast restart retries - on the other side the practice shows that it's confusing 
and the rule should wait the same number of cycles before triggering the action 
again - we'll modify the test to wait again before action retry.

The workaround is to use wrapper script in the exec action instead of direct 
execution of pkill. The wrapper can for use some status file to prevent too 
fast retries - on start it'll check verify that the timestamp is older then 
let's say 5 minutes. If so, then it'll run the pkill and "touch" the file to 
update the timestamp. If the timestamp is newer then 5 minutes, it'll do no 
action.

Regards,
Martin


On Mar 7, 2013, at 7:03 PM, Hagis <address@hidden> wrote:

> 
> Hi,
> 
> I'm working with monit for a year now and found an unexpected behaviour that
> i wanted to verify with you guys.
> 
> My configuration file:
> set daemon 45 with start delay 30
> set logfile syslog facility log_daemon
> set httpd port 2812
> allow admin:admin # Web interface username:password
> 
> check process ACS matching "org.jboss.Main"
>        start program = "/etc/init.d/acs start"
>        stop program = "/etc/init.d/acs stop"
>        if 3 restarts within 3 cycles
>                then timeout
> 
> check host localhost with address localhost
>        if failed port 8080 protocol http
>        and request '/hc/' with timeout 10 seconds for 10 cycles
>                then exec "/usr/bin/pkill -9 java"
>        depends on ACS
> 
> When monit perform the health check, it doest perform it for 10 cycles
> untill performing the "exec kill", but afterwards (after first time of
> killing), it just ignores this configuration
> and performs just 1 health check and if it fails (and it always will because
> of default 30 seconds timeout to start the process which isn't enough) it
> gets into 
> infinite loop of :
> 1. start process.
> 2. health check failed.
> 3. kill the process.
> 
>> From log:
> myserver monit[7253]: 'localhost' failed protocol test [HTTP] at
> INET[localhost:8080/hc/] via TCP -- HTTP: Error receiving data -- Resource
> temporarily unavailable
> myserver monit[7253]: 'localhost' failed protocol test [HTTP] at
> INET[localhost:8080/hc/] via TCP -- HTTP: Error receiving data -- Resource
> temporarily unavailable
> myserver monit[7253]: 'localhost' failed protocol test [HTTP] at
> INET[localhost:8080/hc/] via TCP -- HTTP: Error receiving data -- Resource
> temporarily unavailable
> myserver monit[7253]: 'localhost' failed protocol test [HTTP] at
> INET[localhost:8080/hc/] via TCP -- HTTP: Error receiving data -- Resource
> temporarily unavailable
> myserver monit[7253]: 'localhost' failed protocol test [HTTP] at
> INET[localhost:8080/hc/] via TCP -- HTTP: Error receiving data -- Resource
> temporarily unavailable
> myserver monit[7253]: 'localhost' failed protocol test [HTTP] at
> INET[localhost:8080/hc/] via TCP -- HTTP: Error receiving data -- Resource
> temporarily unavailable
> myserver monit[7253]: 'localhost' failed protocol test [HTTP] at
> INET[localhost:8080/hc/] via TCP -- HTTP: Error receiving data -- Resource
> temporarily unavailable
> myserver monit[7253]: 'localhost' failed protocol test [HTTP] at
> INET[localhost:8080/hc/] via TCP -- HTTP: Error receiving data -- Resource
> temporarily unavailable
> myserver monit[7253]: 'localhost' failed protocol test [HTTP] at
> INET[localhost:8080/hc/] via TCP -- HTTP: Error receiving data -- Resource
> temporarily unavailable
> myserver monit[7253]: 'localhost' failed protocol test [HTTP] at
> INET[localhost:8080/hc/] via TCP -- HTTP: Error receiving data -- Resource
> temporarily unavailable
> myserver monit[32287]: 'ACS' exec: /usr/bin/pkill
> myserver monit[32287]: 'ACS' process is not running
> myserver monit[32287]: 'ACS' trying to restart
> myserver monit[32287]: 'ACS' start: /etc/init.d/acs
> myserver monit[32287]: 'ACS' process is running with pid 8604
> myserver monit[32287]: 'ACS' failed, cannot open a connection to
> INET[localhost:8080/hc/] via
> TCP
> myserver monit[32287]: 'ACS' exec: /usr/bin/pkill
> myserver monit[32287]: 'ACS' process is not running
> myserver monit[32287]: 'ACS' trying to restart
> myserver monit[32287]: 'ACS' start: /etc/init.d/acs
> myserver monit[32287]: 'ACS' process is running with pid 9256
> myserver monit[32287]: 'ACS' failed, cannot open a connection to
> INET[localhost:8080/hc/] via
> TCP
> 
> 
> Can you please review this issue? is it a bug? or just misconfiguration? 
> 
> Thanks!
> -- 
> View this message in context: 
> http://old.nabble.com/monit-ignoring-configuration--a-bug--tp35148976p35148976.html
> Sent from the monit-general mailing list archive at Nabble.com.
> 
> 
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general




reply via email to

[Prev in Thread] Current Thread [Next in Thread]