monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [monit] Problem with monit's "not monitoring" status


From: David Bristow
Subject: Re: [monit] Problem with monit's "not monitoring" status
Date: Mon, 1 Mar 2010 19:07:10 -0500

Is the fix for 5.1.1 in the latest monit version?  If so, we will
upgrade to latest.

On Wed, Feb 24, 2010 at 6:56 PM, Martin Pala <address@hidden> wrote:
> Thanks for data.
>
> It seems to me that the problem could be that service start was requested 
> before the service managed to stop and the start flag was reset after stop => 
> the service stayed in unmonitored mode (as result of stop). To confirm this 
> there should be however additional log about stop program result (depending 
> on result either "stopped" or "failed to stop"):
>
> 1.) are you able to reproduce the issue on will?
>
> 2.) please upgrade to monit-5.1.1 ... there is following fix which could play 
> role as it seems that the pending stop was woke up by start
> --8<--
>  * Fixed #27784: wait_start/wait_stop can advance too quickly.
>  Thanks to Randy Puro for report.
> --8<--
>
> (you can get monit-5.1.1 here: 
> http://www.mmonit.com/monit/dist/monit-5.1.1.tar.gz)
>
>
> ... i'll try to replicate the problem in parallel
>
> Best regards,
> Martin
>
>
> On Feb 24, 2010, at 4:20 PM, David Bristow wrote:
>
>> Here is a copy of the configuration for backgroundrb:
>>
>> check process backgroundrb with pidfile
>> /home/rails/ideeli/qa/current/tmp/pids/backgroundrb_8888.pid
>>  group backgroundrb
>>  start program = "/usr/local/bin/backgroundrb_wrapper start qa
>> /home/rails/ideeli/qa/current/tmp/pids/backgroundrb_8888.pid" with
>> timeout 40 seconds
>>  stop program = "/usr/local/bin/backgroundrb_wrapper stop qa
>> /home/rails/ideeli/qa/current/tmp/pids/backgroundrb_8888.pid" with
>> timeout 60 seconds
>>  if memory > 240 Mb then restart
>>
>> There are no more interesting things in the logs at around this time.
>> Nothing related to backgroundrb, at least.
>>
>> On Mon, Feb 22, 2010 at 4:56 PM, Martin Pala <address@hidden> wrote:
>>> Hi David,
>>>
>>> the service is unmonitored on stop ... the service start enables monitoring 
>>> again, so it's not expected to see unmonitored service after start.
>>>
>>> It seems to me that your 'backgroundrb' service has no "start program = 
>>> ..." in your monit config file. If the "start program" would be defined, it 
>>> should log similar message to "'backgroundrb' stop: 
>>> /usr/local/bin/backgroundrb_wrapper", but with "start" word instead of 
>>> "stop". The message is missing in the log so it was logged either past 
>>> 11:45:32 (which is likely of start is defined) or start program is not 
>>> defined and thus service was not started - check maybe timed out (don't 
>>> know your configuration so i cannot say) ... or maybe somebody stopped it 
>>> again.
>>>
>>> Please can you provide full monit configuration for 'backgroundrb' service 
>>> and rest of debug log between 11:44:48 and 12:08:33?
>>>
>>> Are you able to reproduce the issue on will? I tried to replicate the 
>>> problem but it works fine for me.
>>>
>>> Best regards,
>>> Martin
>>>
>>>
>>>
>>> On Feb 22, 2010, at 3:02 PM, David Bristow wrote:
>>>
>>>> We are having trouble with certain services managed by monit that do
>>>> not restart as they should after being shut down and then started up
>>>> again.
>>>>
>>>> For example, we use backgroundrb.  Someone shut it down for updating,
>>>> and started it up afterwards.  Here is a sample section of the
>>>> monit.log  that shows what was happening at the time:
>>>>
>>>> [EST Feb 19 11:44:48] debug    : stop service 'backgroundrb' on user 
>>>> request
>>>> [EST Feb 19 11:44:48] info     : monit daemon at 19023 awakened
>>>> [EST Feb 19 11:45:10] error    : 'syslog-ng' failed to start
>>>> [EST Feb 19 11:45:10] info     : 'backgroundrb' stop:
>>>> /usr/local/bin/backgroundrb_wrapper
>>>> [EST Feb 19 11:45:19] debug    : start service 'backgroundrb' on user 
>>>> request
>>>> [EST Feb 19 11:45:19] info     : monit daemon at 19023 awakened
>>>> [EST Feb 19 11:45:31] info     : 'backgroundrb' start action done
>>>> [EST Feb 19 11:45:32] info     : Awakened by User defined signal 1
>>>>
>>>> And at 12:09AM, this is the "monit status" for backgroundrb:
>>>>
>>>> Process 'backgroundrb'
>>>>  status                            not monitored
>>>>  monitoring status                 not monitored
>>>>  data collected                    Fri Feb 19 12:08:33 2010
>>>>
>>>> Why does this happen?  We are using monit 5.0.3.
>>>>
>>>> --
>>>> David Bristow <address@hidden>
>>>>
>>>>
>>>> --
>>>> To unsubscribe:
>>>> http://lists.nongnu.org/mailman/listinfo/monit-general
>>>
>>>
>>>
>>> --
>>> To unsubscribe:
>>> http://lists.nongnu.org/mailman/listinfo/monit-general
>>>
>>
>>
>>
>> --
>> David Bristow <address@hidden>
>>
>>
>> --
>> To unsubscribe:
>> http://lists.nongnu.org/mailman/listinfo/monit-general
>
>
>
> --
> To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit-general
>



-- 
David Bristow <address@hidden>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]