Hi Martin,
Thanks for the detailed description...
I've attached my monitrc file. Obviously the executable (logClient) is
an in-house exe, but that shouldn't matter, should it?
I wonder if it is due to the amount of time it takes for the exe to
update it's pid file...
Seems as though it has something to do with the wait_start starting it's
own thread to wait???
The scenario is rather simple. I can reproduce by stopping the service,
then issuing a monit <service> start via the CLI.
If this is not enough detail, or I can help out more, please let me
know.
Thanks,
Aaron
-----Original Message-----
From: address@hidden
[mailto:address@hidden On
Behalf Of Martin Pala
Sent: Wednesday, December 06, 2006 8:50 AM
To: The monit developer list
Subject: Re: <service> start Generates email noise
I have looked on it ...
I will first explain how it works in monit 4.8.2:
Two threads come into play:
- http thread
- monitoring thread
The http thread process the user requested actions (posted either using
CLI or HTML interface). The action to be done is scheduled in
http/cervlet.c:handle_action() via setting of the s->doaction flag for
the appropriate service. When there is no action scheduled, the
s->doaction flag is set to ACTION_IGNORE (in p.y during service
initialization or in validate.c after it was handled). In addition the
Run.doaction is set to TRUE just to signalize that there is some
scheduled action in the service tree. The main monitoring thread is then
wake up by http thread to speedup the action handling.
The main thread then in validate.c:validate() checks whether the
Run.doaction flag is set, since the user actions are preferred. In the
case that it is set, it walks the service tree and for each service
performs the scheduled s->doaction using control_service() and then
resets the s->doaction flag to ACTION_IGNORE. This is all done under
mutex and signal protection, so it cannot be interrupted nor race
condition can occure. The only thread which can call control_service and
physicaly start/restart/etc. the service is the main thread. The
control_service also sets the s->visited flag.
The second service loop is then evaluated - monit walks the service
tree, for each service locks mutex and blocks signals. In the case that
the service was not handled in the same cycle already (s->visited flag
is compared in the check_skip) it checks the s->doaction flag again (to
improve the response time for the services, which has scheduled action
in between the first and second loop in the same cycle). In the case
that it is set, it performs the action, otherwise it checks the service.
The design is similar to signal handling. The http thread just sets the
flag, whereas the monitoring thread handle the action. From theory point
of view, i think no race condition could occure.
I tried to reproduce the problem (official monit-4.8.2 release) without
success.
Can you prepare simple monit configuration and procedure for problem
reproduction?
Thanks,
Martin
Aaron Scamehorn wrote:
Hi Martin,
Actually I think you've now got one thread doing an ACTION_START, and
another doing an ACTION_RESTART on the exact same service.
It is the ACTION_RESTART that is generating what I perceived to be
extraneous emails.
It looks like the do_wakeupcall that you added to
http/cervlet.c:handle_action() is the culprit. Without it, I don't
get
the ACTION_RESTART problem.
Of course you need this now, or else it takes Poll Time to actully
respond to the HTTP events, which is what you were trying to speed up
in
the first place.
Here is the log output, with a bunch of extra messages, including
pthread_t.
3086927552 [CST Dec 1 14:53:25] debug : 'data_dir' filesystem
flags
has not changed since last cycle
3086927552 [CST Dec 1 14:53:25] debug : 'data_dir' space usage
check
passed [current space usage=10.6%]
3086924720 [CST Dec 1 14:53:26] info : monit daemon at 24175
awakened
3086927552 [CST Dec 1 14:53:26] info : Awakened by User defined
signal 1
3086927552 [CST Dec 1 14:53:26] debug : control_service:
ACTION_START for 'LogClient'
3086927552 [CST Dec 1 14:53:26] debug : control_service:
ACTION_START Util_isProcessRunning for 'LogClient'
3086927552 [CST Dec 1 14:53:26] debug : 'LogClient' Error testing
process id [24220] -- No such process
3086927552 [CST Dec 1 14:53:26] debug : do_start:
Util_isProcessRunning for 'LogClient'
3086927552 [CST Dec 1 14:53:26] debug : 'LogClient' Error testing
process id [24220] -- No such process
3086927552 [CST Dec 1 14:53:26] info : 'LogClient' start:
/cogcap/ccts/bin/logclnt
3086927552 [CST Dec 1 14:53:26] debug : 'LogClient' Error testing
process id [24220] -- No such process
3086927552 [CST Dec 1 14:53:26] debug : Monitoring enabled --
service LogClient
3086927552 [CST Dec 1 14:53:26] debug : check_process: calling
Util_isProcessRunning for 'LogClient'
3086927552 [CST Dec 1 14:53:26] debug : 'LogClient' Error testing
process id [24220] -- No such process
3086927552 [CST Dec 1 14:53:26] error : 'LogClient' process is not
running
3086927552 [CST Dec 1 14:53:26] debug : Does not exist
notification
is NOT sent to address@hidden
3086927552 [CST Dec 1 14:53:26] debug : Does not exist
notification
is sent to address@hidden
3076434864 [CST Dec 1 14:53:26] debug : static void* wait_start
for
'LogClient'
3076434864 [CST Dec 1 14:53:26] debug : 1) wait_start: calling
Util_isProcessRunning for 'LogClient', max_tries= 29
3076434864 [CST Dec 1 14:53:26] debug : 'LogClient' Error testing
process id [24220] -- No such process
3086927552 [CST Dec 1 14:53:26] debug : control_service:
ACTION_RESTART for 'LogClient'
3086927552 [CST Dec 1 14:53:26] info : 'LogClient' trying to
restart
3086927552 [CST Dec 1 14:53:26] debug : Monitoring disabled --
service LogClient (stop)
3086927552 [CST Dec 1 14:53:26] debug : do_stop:
Util_isProcessRunning for 'LogClient'
3086927552 [CST Dec 1 14:53:26] debug : 'LogClient' Error testing
process id [24220] -- No such process
3086927552 [CST Dec 1 14:53:26] debug : 'data_dir' filesystem
flags
has not changed since last cycle
3086927552 [CST Dec 1 14:53:26] debug : 'data_dir' space usage
check
passed [current space usage=10.6%]
3076434864 [CST Dec 1 14:53:27] debug : 1) wait_start: calling
Util_isProcessRunning for 'LogClient', max_tries= 28
3076434864 [CST Dec 1 14:53:27] debug : 2) wait_start: calling
Util_isProcessRunning for 'LogClient'
3086927552 [CST Dec 1 14:53:56] debug : check_process: calling
Util_isProcessRunning for 'LogClient'
3086927552 [CST Dec 1 14:53:56] info : 'LogClient' process is
running with pid 24375
3086927552 [CST Dec 1 14:53:56] debug : Exists notification is NOT
sent to address@hidden
3086927552 [CST Dec 1 14:53:56] debug : Exists notification is
sent
to address@hidden
3086927552 [CST Dec 1 14:53:56] debug : 'LogClient' zombie check
passed [status_flag=0000]
3086927552 [CST Dec 1 14:53:56] debug : 'LogClient' loadavg(5min)
check passed [current loadavg(5min)=0.2]
3086927552 [CST Dec 1 14:53:56] debug : 'LogClient' cpu usage
check
passed [current cpu usage=0.0%]
3086927552 [CST Dec 1 14:53:56] debug : 'LogClient' mem amount
check
passed [current mem amount=2764kB]
3086927552 [CST Dec 1 14:53:56] debug : 'data_dir' filesystem
flags
has not changed since last cycle
3086927552 [CST Dec 1 14:53:56] debug : 'data_dir' space usage
check
passed [current space usage=10.6%]
-----Original Message-----
From: address@hidden
[mailto:address@hidden On
Behalf Of Martin Pala
Sent: Thursday, November 30, 2006 4:20 PM
To: The monit developer list
Subject: Re: <service> start Generates email noise
Hello,
this behavior isn't bug - the 'nonexist' event type has possitive and
negative variants:
Does not exist (positive 'nonexist')
vs.
Exists (negative 'nonexist')
The alert statement allows to filter just the general event type, not
the particular polarity (there is no 'exist' option).
=> when you have registered the 'nonexist' event, you should get two
alerts informing about the beggining and end of the problem.
Martin
Aaron Scamehorn wrote:
Hello,
From version 4.8 to 4.8.2, the following bug has been introduced:
When we issue a monit <service> start command, we get "Does not
exist"
and a corresponding "Exists" emails.
Here is the debug output showing this behavior in 4.8.2:
'LogClient' Error testing process id [11034] -- No such process
'LogClient' Error testing process id [11034] -- No such process
'LogClient' start: /cogcap/ccts/bin/logclnt
'LogClient' Error testing process id [11034] -- No such process
Monitoring enabled -- service LogClient
'LogClient' Error testing process id [11034] -- No such process
'LogClient' process is not running
Does not exist notification is sent to address@hidden
'LogClient' Error testing process id [11034] -- No such process
'LogClient' trying to restart
Monitoring disabled -- service LogClient (stop)
'LogClient' Error testing process id [11034] -- No such process
'LogClient' process is running with pid 11189
Exists notification is sent to address@hidden
'LogClient' zombie check passed [status_flag=0000]
'LogClient' loadavg(5min) check passed [current loadavg(5min)=0.2]
'LogClient' cpu usage check passed [current cpu usage=0.0%]
'LogClient' mem amount check passed [current mem amount=2776kB]
Under version 4.8, we don't get the annoying "Does not exist" and a
corresponding "Exists" emails:
'LogClient' Error testing process id [10970] -- No such process
'LogClient' Error testing process id [10970] -- No such process
'LogClient' start: /cogcap/ccts/bin/logclnt
'LogClient' Error testing process id [10970] -- No such process
Monitoring enabled -- service LogClient
'LogClient' Error testing process id [10970] -- No such process
'LogClient' Error testing process id [10970] -- No such process
'LogClient' zombie check passed [status_flag=0000]
'LogClient' loadavg(5min) check passed [current loadavg(5min)=0.1]
'LogClient' cpu usage check passed [current cpu usage=0.0%]
'LogClient' mem amount check passed [current mem amount=2776kB]
Additionally, in our config file, we have the following set:
set alert address@hidden only on { nonexist, exec, connection
}
We shouldn't be getting an "Exists" email under any circumstance,
should
we?
Thanks,
Aaron
------------------------------------------------------------------------
_______________________________________________
monit-dev mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/monit-dev
_______________________________________________
monit-dev mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/monit-dev
_______________________________________________
monit-dev mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/monit-dev
_______________________________________________
monit-dev mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/monit-dev
------------------------------------------------------------------------
_______________________________________________
monit-dev mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/monit-dev