[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
check processes occasionally fails to find processes
From: |
Adrian Bridgett |
Subject: |
check processes occasionally fails to find processes |
Date: |
Tue, 09 Apr 2013 09:55:10 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 |
I'm seeing false negatives on a process check, wondered what I'm missing
(can't see anything in google or the mailing lists):
Random straw grabbing makes me wonder if there's a limit on the
processes that are searched - this is a hadoop node and it's possibly
for there to be a thousand processes (no doubt each with a java command
line half a page long :-))
This is on ubunut with monit 5.3.2,
[BST Apr 7 04:27:57] error : 'hadoop-tasktracker' process is not running
[BST Apr 7 04:27:57] error : monit: Start or stop method not defined
-- process hadoop-tasktracker
[BST Apr 7 04:29:57] info : 'hadoop-tasktracker' process is running
with pid 2143
[BST Apr 8 12:10:19] error : 'hadoop-tasktracker' process is not running
[BST Apr 8 12:10:19] error : monit: Start or stop method not defined
-- process hadoop-tasktracker
[BST Apr 8 12:12:19] info : 'hadoop-tasktracker' process is running
with pid 2143
[BST Apr 9 00:42:28] error : 'hadoop-tasktracker' process is not running
[BST Apr 9 00:42:28] error : monit: Start or stop method not defined
-- process hadoop-tasktracker
[BST Apr 9 00:44:28] info : 'hadoop-tasktracker' process is running
with pid 2143
monit conf is:
check process hadoop-tasktracker matching "java.*proc_tasktracker
(we used to use a pid file but that had problems so we swapped to a
process check).
The processes have been running for several weeks - they've definitely
not been restarted.
address@hidden:/var/log# monit procmatch "java.*proc_tasktracker"
List of processes matching pattern "java.*proc_tasktracker":
------------------------------------------
su mapred -s /usr/lib/jvm/java-6-sun/bin/java -- -Dproc_tasktracker
-Xmx2000m -Dhadoop.log.dir=/var/log/hadoop-0.20-mapreduce
-Dhadoop.log.file=hadoop-hadoop-tasktracker-bl-cassoop-p08.log
-Dhadoop.home.dir=/usr/lib/hadoop-0.20-mapreduce -Dhadoop.id.str=hadoop
-Dhadoop.root.logger=INFO,DRFA
-Djava.library.path=/usr/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64
-Dhadoop.policy.file=hadoop-policy.xml -classpath
/etc/hadoop/conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/usr/lib/hadoop-0.20-mapreduce:/usr/lib/hadoop-0.20-mapreduce/hadoop-core-2.0.0-mr1-cdh4.1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanuti
java -Dproc_tasktracker -Xmx2000m
-Dhadoop.log.dir=/var/log/hadoop-0.20-mapreduce
-Dhadoop.log.file=hadoop-hadoop-tasktracker-bl-cassoop-p08.log
-Dhadoop.home.dir=/usr/lib/hadoop-0.20-mapreduce -Dhadoop.id.str=hadoop
-Dhadoop.root.logger=INFO,DRFA
-Djava.library.path=/usr/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64
-Dhadoop.policy.file=hadoop-policy.xml -classpath
/etc/hadoop/conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/usr/lib/hadoop-0.20-mapreduce:/usr/lib/hadoop-0.20-mapreduce/hadoop-core-2.0.0-mr1-cdh4.1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/
------------------------------------------
Total matches: 2
WARNING: multiple processes matched the pattern. The check is
FIRST-MATCH based, please refine the pattern
Thanks for any help/advice,
Adrian
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- check processes occasionally fails to find processes,
Adrian Bridgett <=