monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: event engine patch update


From: Martin Pala
Subject: Re: event engine patch update
Date: Sun, 28 Mar 2004 18:00:34 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040312 Debian/1.6-3

Jan-Henrik Haukeland wrote:
Martin Pala <address@hidden> writes:


Yes :)

In the attachment is version for current cvs sources. In addition to
last mentioned patch it fixes compile time warning for deprecated
usage of casted expression as lvalue in net.c (not related to event
engine refactoring).

I don't know any refactoring related bug currently - it seems that it
works well (but it still needs further testing). I think the patch is
ready for checkout.


Well, that was a big patch! I have just browsed through it and
concentrated the browsing around the datastructure changes in
monitor.h and read event.c/h and validate.c. The first impression is
that this looks like an improvement and refactor away the weakness in
the orginal implementation.

As far as I can see, you refactor out the internal even handling in
validate.c and centralize it in event.c, this is good. Event handling
in validate.c is basically reduced to post an event for any and all
tests in validate, which is also good since you now have a one single
unified interface into the event machinery from validate, i.e.
Event_post() and not the ugly internal flag settings we had before.

Events posted via Event_post() are put on an event list and handled
based on certain conditions in event.c:handle_event().

Did I get it right?

Yes


I think this patch is an improvement and I'm +1 for checking this into
CVS. But first I have a few comments and questions

1) Have you run monit with this patch through valgrind? To me it looks
like events are added to the list but not dequeued and that there is a
massive memory leak here. Event_free() is not called at all in the code.

The event queue is per service - each service has its own related events list.

Monit needs to know whether the state changed. The state depends on the result of testing rule => it is directly related to particular rule. Thus we need to keep the result of each testing rule and compare it with new value in next cycle. This is solved via events list.

The list is empty until some FAILED event will occure. In such case, monit will add appropriate event to the list, if it doesn't exist already. PASSED events are ignored until first FAILED event will occure (this way monit will not flood you with tens of "up" events on its startup) => we start to watch state change after first failure.

Events are kept in the list until monit is stopped or reloaded - they must not be deleted, because we need to handle states, where the error ratio exceeds timeout limit (too many failed->passed->failed->passed->... state changes in given timeframe). This allows to easily implement triggers in the future - you can have several error levels and do custom action depending on error ratio.

So, lets say some event (FAILED or PASSED) of given type (EVENT_NONEXIST/EVENT_SPACE/EVENT_CONNECTION/...) occured. Monit will try to find the event with same origin (e.g. produced by the same testing rule) and the same type identification.

The rule which produced the event is uniquely identified by 'action' parameter. The 'action' points to address of EventAction object, which defines per rule custom actions what to do in the case that FAILED or PASSED event occured. Thus when monit posts the event, event will inherite and share the EventAction via its memory address.

In the case that monit will find the same event, it will compare the state. When state changed, Event_post will set the new state and state_changed flag. In the case that the event was not found and it is FAILED event, monit will add it to the list. The control is passed to the handle_event now.

handle_event will call handle_action with per rule specific failed or passed state handlers (based on the event's polarity - FAILED or PASSED). The state_change flag is reseted after event was handled, thus we can compare the result of next cycle and decide whether it is different or equal.

handle_action will call handle_alert regardless of event type, but on state change only => the alert is delivered only once on the state change. Thus as soon as the service failed, you will receive FAILED alert and as soon as the service has recovered, you will receive PASSED event (event description is type dependent of course :)

All failed events occurences are handled, first passed event occurence is handled. The service can change the state from failed->passed->failed->passed->... monit keeps monitoring and event handling all the time, until timeout will occure (in the case that it is defined).

handle_action will take care for state handler action at all (ACTION_ALERT/ACTION_EXEC/ACTION_RESTART/ACTION_START/ACTION_STOP/ACTION_UNMONITOR).

In the attachment is simplified picture of new monit event engine.


2) In validate.c, the report parameter has bugging me for a while. Can
you please get rid of it by calling Event_post() in the lower
function? :-) E.g. check_uid() and check_gid() can just call
Event_post() directly. This makes the code simpler as well.

Good suggestion :)


3) Please use the "one-true bracket style" in the code i.e.
         if(x) {
             blabla
         }

   and also break lines at 79 column width. Apropos I found out that
   it is possible to use 'C-c .' to set the C indentation style (to
   gnu) in (X)emacs, which is the style we use.

OK. I used in event.c style which i preffer, i hope it will not interfer too much.


4) It would be nice (but not required) if you could give a short
"formalized" description of the new event machinery and data
structure. For example a drawing and/or text. After all this is a
pretty important part of the code and it would be good to have an
overview :)


The new data structure encapsulates the objects and is based on objects relationships and inharitence/sharing. I hope above text and picture will help. The picture doesn't show all objects and their names (for example Action_T object consists optionaly of Command_T object in the case of ACTION_EXEC type) - details can be found in monitor.h


Cheers,
Martin



JPEG image


reply via email to

[Prev in Thread] Current Thread [Next in Thread]