[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [monit-dev] Startup problems for a monit service
From: |
Aaron Scamehorn |
Subject: |
RE: [monit-dev] Startup problems for a monit service |
Date: |
Fri, 21 Aug 2009 07:38:38 -0500 |
Hi Martin,
Thanks for clearing that up. I was confused about the start delay setting.
I did run monit w/ the -v option, and it does appear to be the behavior that you describe.
It looks like the pidfile is not updated within 30 seconds, and monit thinks it failed.
I'll move the timeout option to the daemon line as you recommend.
Thanks for your help.
Aaron
-----Original Message-----
From: address@hidden on behalf of Martin Pala
Sent: Thu 8/20/2009 3:43 PM
To: The monit developer list
Subject: Re: [monit-dev] Startup problems for a monit service
Hi Aaron,
regarding the start delay ... this is option of "set daemon" statement
and sets start delay of monit itself - i.e. when monit starts, it wait
60s before starting service verification. You can set the start
timeout this way:
start program = "/cogcap/ccts/bin/mdService start" with timeout 60
seconds
Regarding the "process is not running" message - it is possible that
if your process was slow starting, it didn't updated the pidfile
within 30s so monit though that it didn't started (which is true at
that point in time). If you'll set start timeout, it should fix the
problem. To debug the -v option will provide more info.
Martin
On Aug 19, 2009, at 2:43 PM, Aaron Scamehorn wrote:
> Hello,
>
> This is monit version 5.0
> I'm having difficulty with one of our applications that I have monit
> setup to monitor.
>
> The pertinant config is below:
> # Monit Config file for Magneto
> set daemon 10 with start delay 60 # Poll at 10-second intervals
> set statefile /tmp/monit.state
> check process mdService
> with pidfile "/cogcap/ccts/var/run/mdService.magneto.pid"
> start program = "/cogcap/ccts/bin/mdService start"
> stop program = "/cogcap/ccts/bin/mdService stop"
> if 10 restarts within 11 cycles then timeout
> if mem > 256 Mb then alert
> if cpu usage > 95% for 11 cycles then restart
> #if failed port 9998 then restart
> group base
> It is a slow to start app, which is why I've commented out the port
> monitoring, and added the start delay of 60.
>
>
> From /var/log/messages, I can see monit tries to start the process
> at 00:15:01, and at 00:15:31 issues a failed to start.
>
> At 00:15:47 I see monit now tries a restart, and a fail at 00:16:17.
>
> This repeats 2 more times.
>
> What ends up happening is I'm left with 4 processes running, because
> none of them actually failed to start.
>
> So, first question is why does monit issue the first failure after
> only 30 seconds if my start delay is 60?
> How does monit determine that the startup was a failure? I'm
> certain that the pid file is in place and contains the correct pid.
>
> I guess my next step is to run monit -v?
>
> Any help would be appreciated.
>
> Thanks,
> Aaron
>
>
> Aug 19 00:15:01 magneto monit[7908]: 'mdService' start: /cogcap/ccts/
> bin/mdService
> Aug 19 00:15:31 magneto monit[7908]: 'mdService' failed to start
> Aug 19 00:15:36 magneto monit[7908]: 'mdService' start action done
> Aug 19 00:15:47 magneto monit[7908]: 'mdService' process is not
> running
> Aug 19 00:15:47 magneto monit[7908]: 'mdService' trying to restart
> Aug 19 00:15:47 magneto monit[7908]: 'mdService' start: /cogcap/ccts/
> bin/mdService
> Aug 19 00:16:17 magneto monit[7908]: 'mdService' failed to start
> Aug 19 00:16:27 magneto monit[7908]: 'mdService' process is not
> running
> Aug 19 00:16:27 magneto monit[7908]: 'mdService' trying to restart
> Aug 19 00:16:27 magneto monit[7908]: 'mdService' start: /cogcap/ccts/
> bin/mdService
> Aug 19 00:16:58 magneto monit[7908]: 'mdService' failed to start
> Aug 19 00:17:08 magneto monit[7908]: 'mdService' process is not
> running
> Aug 19 00:17:08 magneto monit[7908]: 'mdService' trying to restart
> Aug 19 00:17:08 magneto monit[7908]: 'mdService' start: /cogcap/ccts/
> bin/mdService
> Aug 19 00:17:38 magneto monit[7908]: 'mdService' failed to start
> Aug 19 00:17:48 magneto monit[7908]: 'mdService' process is not
> running
> Aug 19 00:17:48 magneto monit[7908]: 'mdService' trying to restart
> Aug 19 00:17:48 magneto monit[7908]: 'mdService' start: /cogcap/ccts/
> bin/mdService
> Aug 19 00:18:05 magneto monit[7908]: 'mdService' started
> Aug 19 00:18:15 magneto monit[7908]: 'mdService' process is running
> with pid 24160
> _______________________________________________
> monit-dev mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/monit-dev