monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

check processes occasionally fails to find processes


From: Adrian Bridgett
Subject: check processes occasionally fails to find processes
Date: Tue, 09 Apr 2013 09:55:10 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130308 Thunderbird/17.0.4

I'm seeing false negatives on a process check, wondered what I'm missing (can't see anything in google or the mailing lists):

Random straw grabbing makes me wonder if there's a limit on the processes that are searched - this is a hadoop node and it's possibly for there to be a thousand processes (no doubt each with a java command line half a page long :-))

This is on ubunut with monit 5.3.2,

[BST Apr  7 04:27:57] error    : 'hadoop-tasktracker' process is not running
[BST Apr 7 04:27:57] error : monit: Start or stop method not defined -- process hadoop-tasktracker [BST Apr 7 04:29:57] info : 'hadoop-tasktracker' process is running with pid 2143
[BST Apr  8 12:10:19] error    : 'hadoop-tasktracker' process is not running
[BST Apr 8 12:10:19] error : monit: Start or stop method not defined -- process hadoop-tasktracker [BST Apr 8 12:12:19] info : 'hadoop-tasktracker' process is running with pid 2143
[BST Apr  9 00:42:28] error    : 'hadoop-tasktracker' process is not running
[BST Apr 9 00:42:28] error : monit: Start or stop method not defined -- process hadoop-tasktracker [BST Apr 9 00:44:28] info : 'hadoop-tasktracker' process is running with pid 2143

monit conf is:
check process hadoop-tasktracker matching "java.*proc_tasktracker

(we used to use a pid file but that had problems so we swapped to a process check).

The processes have been running for several weeks - they've definitely not been restarted.

address@hidden:/var/log# monit procmatch "java.*proc_tasktracker"
List of processes matching pattern "java.*proc_tasktracker":
------------------------------------------
su mapred -s /usr/lib/jvm/java-6-sun/bin/java -- -Dproc_tasktracker -Xmx2000m -Dhadoop.log.dir=/var/log/hadoop-0.20-mapreduce -Dhadoop.log.file=hadoop-hadoop-tasktracker-bl-cassoop-p08.log -Dhadoop.home.dir=/usr/lib/hadoop-0.20-mapreduce -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.library.path=/usr/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64 -Dhadoop.policy.file=hadoop-policy.xml -classpath /etc/hadoop/conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/usr/lib/hadoop-0.20-mapreduce:/usr/lib/hadoop-0.20-mapreduce/hadoop-core-2.0.0-mr1-cdh4.1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanuti java -Dproc_tasktracker -Xmx2000m -Dhadoop.log.dir=/var/log/hadoop-0.20-mapreduce -Dhadoop.log.file=hadoop-hadoop-tasktracker-bl-cassoop-p08.log -Dhadoop.home.dir=/usr/lib/hadoop-0.20-mapreduce -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.library.path=/usr/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64 -Dhadoop.policy.file=hadoop-policy.xml -classpath /etc/hadoop/conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/usr/lib/hadoop-0.20-mapreduce:/usr/lib/hadoop-0.20-mapreduce/hadoop-core-2.0.0-mr1-cdh4.1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/
------------------------------------------
Total matches: 2
WARNING: multiple processes matched the pattern. The check is FIRST-MATCH based, please refine the pattern


Thanks for any help/advice,

Adrian



reply via email to

[Prev in Thread] Current Thread [Next in Thread]