|
From: | Martin Pala |
Subject: | Re: monit ./control.c ./event.c ./event.h ./l.l ./m... |
Date: | Thu, 28 Aug 2003 00:13:14 +0200 |
User-agent: | Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030714 Debian/1.4-2 |
Jan-Henrik Haukeland wrote:
What i expect is, that all tests will behave consistently for same actions. For example if you will use:Martin Pala <address@hidden> writes:- fix checksum, gid, uid, permission tests to not timeout after error occurence (this way it will behave more consistent - immediate timeout can be caused by unmonitor action, for other cases modified timeout statement should fit)I'm not sure you should change the timeout statement. As I said before, it is for process (re)starts and other events are not very interesting in this context. For instance: if 1 checksum event within 1 cycle then timeout Is uninteresting, because, either you want checksum to unmonitor once or you want checksum to report always. Likewise with other eventsexcept (re)starts.
if failed host www.tildeslash.com port 80 protocol http then alert and if failed checksum then alertyou will receive different behavior - the first case will send infinite alerts, until it is restricted by timeout statement. The second case will send only one alert, but it won't disable monitoring. What is worse is, that original checksum is rewriten to actual (bad) value. You will see from web interface erroneous checksum as associated checksum (the original correct checksum is forgotten). This affects uid, gid and permission tests too.
I think it will be better: - to keep original associated value (checksum/uid/gid/permission) - to provide consistent behavior for all 'alert' action instances The first hint is clear - the second has two possibilities: 1.) support only one timeout statement instance: IF number EVENTS WITHIN number CYCLES THEN TIMEOUTIn such case it will be pretty simple - all executive events (such as restart, timestamp, checksum, gid, uid, permission, checksum, etc.) will increment event counter in the case that they will fail (each cycle). As soon as the counter will overflow, the service will be timed out (alias unmonitored). The advantage is simplicity, but there is no difference between events - you can set common/shared limit only.
2.) allow specification of timeout statement for each event type (multiinstance statement):
IF number event WITHIN number CYCLES THEN TIMEOUT... where event is choice of {CHECKSUM|GID|UID|RESTART|TIMESTAMP|SIZE|etc.}
If you want to, you can set different timeout limits for each event type. The advantage is, that you can choose standalone limit for each service, as well as you don't need to limit some specific event type if you want to (which is rare case i think).
I think this way the behavior will be consistent enough. Both solutions are possible. What do you think?
Martin
[Prev in Thread] | Current Thread | [Next in Thread] |