freeipmi-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freeipmi-users] problems with bmc-watchdog


From: Al Chu
Subject: Re: [Freeipmi-users] problems with bmc-watchdog
Date: Wed, 05 May 2010 14:32:19 -0700

Hey Dave,

Nothing you're doing seems wrong.  I just tried the 0.8.6.beta0 on one
of my systems (new rpm build), and it seemed to be running fine.

I have a couple of guesses.  Possibly there is some combination of
default settings that your BMC doesn't like and it is getting confused.
Or possibly, the BMC is too slow, so my initial setup up of the watchdog
timer isn't "taken yet" by the BMC, and thus it thinks:

"[May 05 14:52:22]: timer stopped by another process"

Let's try some tests.  Could you run bmc-watchdog "by hand" to make sure
things look like it's working right?  "by hand", I mean something like
run:

bmc-watchdog --get (see what the current watchdog settings are)
bmc-watchdog --set ... (with same as deamon options, except not the
reset interval '-e 60')
bmc-watchdog --get (see that things are set)
bmc-watchdog --start
bmc-watchdog --get (make sure things changed, timer is running)
bmc-watchdog --get (make sure timer is counting down)
bmc-watchdog --reset
bmc-watchdog --get (make sure timer has reset)

(and you probably want to do bmc-watchdog --stop at the end)

This can help us isolate things.  If the above works, then maybe there
is a timing issue within your BMC that we need to get around.  I'm a
little perplexed as to why it would work with the openipmi driver.  It's
possible it's more generous on some timeouts of packets and such.  Or
maybe the openipmi driver's own watchdog implementation/code has done
something to massage the BMC that I'm unaware of.

Al

On Wed, 2010-05-05 at 07:27 -0700, Dave Love wrote:
> I'm trying to get bmc-watchdog working on Solaris (Sun ELOM, x4500) and
> failing.  However, I'm seeing similar problems on GNU/Linux (RHEL 5,
> also ELOM, but x4200).
> 
> First off on GNU/Linux bmc-watchdog won't work (`Get Cmd: BMC busy')
> unless I specify the openmpi driver explicitly, which I don't have to
> with other commands like bmc-info.
> 
> What I see on both OSes trying to start the daemon is log messages like
> 
>   [May 05 14:52:22]: starting bmc-watchdog daemon
>   [May 05 14:52:22]: timer stopped by another process
>   [May 05 14:52:22]: stopping bmc-watchdog daemon
>   
> In fact, the watchdog counter is started by the daemon and left running,
> so the system reset after 900s when I initially didn't realize what was
> happening.
> 
> In the GNU/Linux case, I just built and installed RPMS from the
> 0.8.6.beta0 source and did `service start bmc-watchdog' after setting
> the device in freeipmi.conf, assuming the defaults are sensible, so
> there's limited scope for my idiocy.  I tried initially with the example
> parameters from the man page, though, and I have actually checked the
> FAQ this time :-/.
> 
> Am I actually doing something fundamentally wrong?  If so, I think it
> needs a health warning somewhere, as it causes resets of the system
> after you start it going like that and don't realize.  (Not meaning to
> whinge, of course.)
> 
> 
> _______________________________________________
> Freeipmi-users mailing list
> address@hidden
> http://*lists.gnu.org/mailman/listinfo/freeipmi-users
> 
-- 
Albert Chu
address@hidden
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory





reply via email to

[Prev in Thread] Current Thread [Next in Thread]