freeipmi-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freeipmi-devel] Re: Another FreeIPMI beta w/ BMC watchdog workaroun


From: Frank Steiner
Subject: Re: [Freeipmi-devel] Re: Another FreeIPMI beta w/ BMC watchdog workaround for Sun machines
Date: Wed, 07 Jul 2010 08:52:42 +0200
User-agent: Thunderbird 2.0.0.24 (X11/20100302)

Albert Chu wrote

> Hey Frank,
> 
> This is indeed very strange.  I assume the reboots are because the timer
> eventually times out, perhaps because the resets are no longer working
> (lets say the BMC goes out to lunch).

I don't think so because in the tests I repeat the resets every second
and I always see if they succeed or not. Many of them are rejected with
some kind of error messages, but it never happens that all fail for more
than one minute.

However, when I loop "bmc-watchdog -g" I get the strangest results with
all fields showing complete nonsense, like 
Initial Countdown:      6553 sec
Present Countdown:      0 sec

and a second later

Initial Countdown:      900 sec
Present Countdown:      24513 sec

and so on. Also the action field etc. change their values. If the timer 
would just run down, the host would reset and not power-off. So I guess
that the ILOM is just that buggy that it can get confused by polling
or resetting it :-(

> Does the bmc-watchdog log say anything interesting?  Normally
> it's /var/log/freeipmi/bmc-watchdog.log.

It says a lot, but nothing different just before shutting down that it
hadn't showed before. E.g.:

[Jul 05 08:38:08]: _set_watchdog_timer_cmd: fill_cmd_set_watchdog_timer: 
Invalid argument
[Jul 05 08:38:18]: Get Cmd: ipmi_kcs_cmd: driver timeout
[Jul 05 08:38:22]: Get Cmd: cmd error: 2h
[Jul 05 08:38:38]: _get_watchdog_timer_cmd: fiid_obj_get: 'timeout_action': 
data not available
[Jul 05 08:38:38]: _set_watchdog_timer_cmd: fill_cmd_set_watchdog_timer: 
Invalid argument
[Jul 05 08:38:44]: Set Cmd: ipmi_kcs_cmd: driver timeout
[Jul 05 08:38:50]: Set Cmd: ipmi_kcs_cmd: internal IPMI error
[Jul 05 08:39:01]: Set Cmd: ipmi_kcs_cmd: internal IPMI error
[Jul 05 08:39:23]: _get_watchdog_timer_cmd: fiid_obj_get: 'timeout_action': 
data not available
[Jul 05 08:39:27]: _get_watchdog_timer_cmd: fiid_obj_get: 
'initial_countdown_value': data not available
[Jul 05 08:39:35]: _get_watchdog_timer_cmd: fiid_obj_get: 
'initial_countdown_value': data not available
[Jul 05 08:39:51]: Get Cmd: cmd error: 80h
[Jul 05 08:39:52]: Set Cmd: ipmi_kcs_cmd: internal IPMI error
[Jul 05 08:40:21]: Get Cmd: ipmi_kcs_cmd: driver timeout


Strange enough, the watchdog reacts a lot quicker and more stable when
I poll it through the network interface by "ipmitool ... bmc watchdog reset"
or "get".
It immediately responds, always with correct values, and never shuts down.

Maybe that's because I don't have any special driver loaded on Linux?
The sun driver is not available for Linux as far as I understood, so
I'm just using "bmc-watchdog -g" without any drivers.

cu,
Frank

-- 
Dipl.-Inform. Frank Steiner   Web:  http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik    Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17           Phone: +49 89 2180-4049
80333 Muenchen, Germany       Fax:   +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *



reply via email to

[Prev in Thread] Current Thread [Next in Thread]