|
From: | Martin Pala |
Subject: | Re: 4.0 showstopper? |
Date: | Wed, 17 Sep 2003 22:30:51 +0200 |
User-agent: | Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030908 Debian/1.4-4 |
Jan-Henrik Haukeland wrote:
Martin Pala <address@hidden> writes:So, thanks to Hauk the problem with monit crashing was solved :)It was teamwork, if you had not analyzed this I would not had tought about sendmail :-)The second (non-critical) reported problem remains: --SNIP-- - synchronize main and wait_start thread to not check the service which is in wait_start stage. This is standalone problem - monit can try to start the service in paralel without realy waiting for service to start. --SNIP-- It is not dangerous now after sendmail() was fixed, but it is not correct. I think we should skip the service check in the case that it is in wait_start stage. I can look on it if you agree.Since wait_start() only waits for Run.polltime time and there should only be unique services in the monitrc file I think the possibility for monit start a service in parallell is microscopic, unless I missed something?
You are rigth, but i think this race condition can occure in the case that:- there are more then one monitored services which uses the same start method - or there are more then one test inside monitored service which uses the same start method
The first case (more then one service using the same start method) seems like configuration error (dependency can be used), but the second case can occure in praxis, for example:
check process myprocess with pidfile /var/run/myprocess.pid start program = "/etc/init.d/myprocess start" stop program = "/etc/init.d/myprocess stop" if failed port 80 then restart if failed port 443 then restartMonit will test all ports regardless of the particular result. In the case that the first will fail, monit will call stop and start methods via restart event and continue the testing immediately. In the (special) case, that the start method is slow, it can collide with second test, which will involve restart event too => the two restart (stop and start) methods will "figth" and the result is unpredictable.
This is just theory based on lookup to the code - maybe i'm wrong. Martin
[Prev in Thread] | Current Thread | [Next in Thread] |