After upgrade of ldap server to iDS5.2 (SUN ONE/iPlanet) monit isn't
able to read process data for ldap ("failed to get process data" error).
Monit is watching two services:
---monitrc---
set daemon 120
set logfile syslog
set mailserver 192.168.100.106
set mail-format { from: address@hidden }
set httpd port 2812 and
address 192.168.100.132
allow user:password
check sshd with pidfile /var/run/sshd.pid
start program "/etc/init.d/sshd start"
stop program "/etc/init.d/sshd stop"
host 192.168.100.132 port 22 protocol ssh
check ldap with pidfile /usr/iplanet/ldap1/slapd-ldap1/logs/pid
timeout(5, 5)
host 192.168.100.132 port 389 protocol ldap3
mode passive
---monitrc---
address@hidden # monit -v validate
(...)
-------------------------------------------------------------------------------
'sshd' is running with pid 568
'sshd' zombie check passed [status_flag=0000]
'sshd' check_process_state() passed.
'sshd' succeeded connecting to INET[192.168.100.132:22]
'sshd' succeeded testing protocol [SSH] at INET[192.168.100.132:22]
'ldap' is running with pid 10340
'ldap' failed to get process data
'ldap' succeeded connecting to INET[192.168.100.132:389]
'ldap' succeeded testing protocol [LDAP3] at INET[192.168.100.132:389]
address@hidden # ps -Leaf | grep monit | grep -v grep
root 16093 1 1 5 0 10:16:31 ? 0:00 monit
root 16093 1 2 5 0 10:16:31 ? 0:00 monit
root 16093 1 3 5 0 10:16:31 ? 0:00 monit
root 16093 1 5 5 0 10:16:31 ? 0:00 monit
root 16093 1 6 5 0 10:16:31 ? 0:00 monit
address@hidden # lsof | grep monit
monit 16093 root cwd VDIR 85,0 512 2 /
monit 16093 root txt VREG 85,0 476792 1430633
/usr/bin/monit
monit 16093 root txt VREG 85,0 44844 34341
/usr/lib/nss_files.so.1
monit 16093 root txt VREG 85,0 191996 33969
/usr/lib/libthread.so.1
monit 16093 root txt VREG 85,0 1157924 34367
/usr/lib/libc.so.1
monit 16093 root txt VREG 85,0 908044 34018
/usr/lib/libnsl.so.1
monit 16093 root txt VREG 85,0 24968 33978
/usr/lib/libmp.so.2
monit 16093 root txt VREG 85,0 4848 433665
/usr/platform/sun4u-us3/lib/libc_psr.so.1
monit 16093 root txt VREG 85,0 70864 34293
/usr/lib/libsocket.so.1
monit 16093 root txt VREG 85,0 382600 34056
/usr/lib/libresolv.so.2
monit 16093 root txt VREG 85,0 38904 34374
/usr/lib/libpthread.so.1
monit 16093 root txt VREG 85,0 5292 33958
/usr/lib/libdl.so.1
monit 16093 root txt VREG 85,0 238776 33970
/usr/lib/ld.so.1
monit 16093 root 0u VCHR 13,2 0t0 1847319
/devices/pseudo/address@hidden:null
monit 16093 root 1u VCHR 13,2 0t0 1847319
/devices/pseudo/address@hidden:null
monit 16093 root 2u VCHR 13,2 0t0 1847319
/devices/pseudo/address@hidden:null
monit 16093 root 3w VCHR 21,0 0t0 1847315
/devices/pseudo/address@hidden:conslog->LOG
monit 16093 root 4r PSTA 294,0 1990
/proc/10340/status
monit 16093 root 5u IPv4 0x30001ac22f8 0t0 TCP
ldap1:2812 (LISTEN)
monit 16093 root 7r PSTA 294,0 1990
/proc/10340/status
monit 16093 root 8r PSTA 294,0 1990
/proc/10340/status
monit 16093 root 9r PSTA 294,0 1990
/proc/10340/status
monit 16093 root 10r PSTA 294,0 1990
/proc/10340/status
monit 16093 root 11r PSTA 294,0 1990
/proc/10340/status
monit 16093 root 12r PSTA 294,0 1990
/proc/10340/status
After following few hours monit consumes all filedescriptors (monit
has 255 fd limit on out system) by accessing /proc/10340/status (250
times) and another problems related to unavailable fd's start
(monitoring fails thereafter generaly).
address@hidden # ps -elf | grep 10340 | grep -v grep
8 R iplanet 10340 1 26 79 20 ? 202788 Jul 03
? 1054:25 ./ns-slapd -D /usr/iplanet/ldap1/sl
address@hidden # ls -l /proc/10340/status
-r-------- 1 iplanet giplan 1232 Jul 3 07:23 /proc/10340/status
address@hidden # cat /proc/10340/status
cat: input error on /proc/10340/status: Value too large for defined
data type
address@hidden # truss -r all -w all -f monit validate
(...)
16372: open("/proc/10340/status", O_RDONLY) = 3
16372: read(3, 0xFFBEE7F8, 4095) Err#79 EOVERFLOW
16372: fstat(-1, 0xFFBEEAC8) Err#9 EBADF
(...)
Complete truss is in the attachment.
It is little bit strange - maybe it is caused by 64-bit support of
iDS5.2 (it was 32-bit program before this version).
It seems that there are two problems:
1.) monit isn't able to read 64-bit processes status
2.) there is some loop in proc stuff which causes filedescriptors leak
in monit
I tried present CVS version too - both problems remains in it too.
Martin
------------------------------------------------------------------------
_______________________________________________
monit-dev mailing list
address@hidden
http://mail.nongnu.org/mailman/listinfo/monit-dev