Re: monit ./control.c ./event.c ./event.h ./l.l ./m...

monit-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: monit ./control.c ./event.c ./event.h ./l.l ./m...

From:	Martin Pala
Subject:	Re: monit ./control.c ./event.c ./event.h ./l.l ./m...
Date:	Thu, 28 Aug 2003 00:13:14 +0200
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030714 Debian/1.4-2

Jan-Henrik Haukeland wrote:

Martin Pala <address@hidden> writes:

        - fix checksum, gid, uid, permission tests to not timeout after error
        occurence (this way it will behave more consistent - immediate timeout
        can be caused by unmonitor action, for other cases modified timeout
        statement should fit)


I'm not sure you should change the timeout statement. As I said
before, it is for process (re)starts and other events are not very
interesting in this context. For instance:

if 1 checksum event within 1 cycle then timeout

Is uninteresting, because, either you want checksum to unmonitor once
or you want checksum to report always. Likewise with other events

except (re)starts.

What i expect is, that all tests will behave consistently for sameactions. For example if you will use:


if failed host www.tildeslash.com port 80 protocol http then alert

and

if failed checksum then alert

you will receive different behavior - the first case will send infinitealerts, until it is restricted by timeout statement. The second casewill send only one alert, but it won't disable monitoring. What is worseis, that original checksum is rewriten to actual (bad) value. You willsee from web interface erroneous checksum as associated checksum (theoriginal correct checksum is forgotten). This affects uid, gid andpermission tests too.


I think it will be better:

- to keep original associated value (checksum/uid/gid/permission)
- to provide consistent behavior for all 'alert' action instances

The first hint is clear - the second has two possibilities:

1.) support only one timeout statement instance:

 IF number EVENTS WITHIN number CYCLES THEN TIMEOUT

In such case it will be pretty simple - all executive events (such asrestart, timestamp, checksum, gid, uid, permission, checksum, etc.) willincrement event counter in the case that they will fail (each cycle). Assoon as the counter will overflow, the service will be timed out (aliasunmonitored). The advantage is simplicity, but there is no differencebetween events - you can set common/shared limit only.

2.) allow specification of timeout statement for each event type(multiinstance statement):


 IF number event WITHIN number CYCLES THEN TIMEOUT

... where event is choice of{CHECKSUM|GID|UID|RESTART|TIMESTAMP|SIZE|etc.}

If you want to, you can set different timeout limits for each eventtype. The advantage is, that you can choose standalone limit for eachservice, as well as you don't need to limit some specific event type ifyou want to (which is rare case i think).

I think this way the behavior will be consistent enough. Both solutionsare possible. What do you think?


Martin

[Prev in Thread]

Current Thread

[Next in Thread]

monit ./control.c ./event.c ./event.h ./l.l ./m..., Martin Pala, 2003/08/26
- Re: monit ./control.c ./event.c ./event.h ./l.l ./m..., Jan-Henrik Haukeland, 2003/08/27
  - Re: monit ./control.c ./event.c ./event.h ./l.l ./m..., Jan-Henrik Haukeland, 2003/08/27
  - Re: monit ./control.c ./event.c ./event.h ./l.l ./m..., Martin Pala <=

Prev by Date: Re: Status 4.0
Next by Date: Re: Status 4.0
Previous by thread: Re: monit ./control.c ./event.c ./event.h ./l.l ./m...
Next by thread: monit control.c
Index(es):
- Date
- Thread