freeipmi-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fw: [Freeipmi-devel] ibmx3650 reboots after ipmi-sel is unable to ge


From: Al Chu
Subject: Re: Fw: [Freeipmi-devel] ibmx3650 reboots after ipmi-sel is unable to get SEL record
Date: Mon, 26 Jan 2009 09:54:09 -0800

Hey Won,

On Sun, 2009-01-25 at 23:00 -0800, Won De Erick wrote:
> I am forwarding this to the FreeIPMI users mailing list. Hope, I can get 
> hints from you all.
> Thank you.
> 
> 
> 
> ----- Forwarded Message ----
> From: Won De Erick <address@hidden>
> To: Albert Chu <address@hidden>
> Cc: address@hidden
> Sent: Saturday, January 24, 2009 11:55:24 AM
> Subject: Re: [Freeipmi-devel] ibmx3650 reboots after ipmi-sel is unable to 
> get SEL record
> 
> Pls disregard previous email. I forgot to attach the file. :)

Did you send me the wrong debug file?  I see debug output from
ipmi-sensors??

> Hi Al,
> 
> With IBM x3650, I  noticed that ipmi-sel is unable to get the SEL record.
> 
> # ipmi-sel --version
> IPMI Sensors [ipmi-sel-0.6.10]
> 
> # ipmi-sel > ibm3650-dsc2075-sel.txt
> ipmi_cmd_get_sel_entry: BMC busy
> ipmi-sel: unable to get SEL record
> 
> After the above, the box automatically rebooted. Is this normal?

I have never seen this behavior before, and I wouldn't consider it
"good" in any definition.  This is likely a bug in the IBM
implementation.  The "BMC busy" means exactly what it says, the BMC is
busy and cannot respond to IPMI requests.  It by itself is not a
problem.  For example, some other IPMI tasks are hogging resources.  But
you should presumably be able to reach the card eventually.  Is it
possible you have other IPMI things running in the background?

> I then cleared the SEL records, thinking that the reboot might have been 
> triggered due to a full SEL.

I think this is a reasonable guess.  It could be anything really.  

> # ipmi-sel -c
> 
> # reboot
> # ipmi-sel
> 1:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00
> # ipmi-sel
> 1:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00
> 
> # reboot
> # ipmi-sel
> 1:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00
> 2:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00
> 3:OEM defined = 02 00 00 FF 00 00 00 00 20 00 00 00 00
> 
> Then retried the previous command that caused an error.
> 
> # ipmi-sel > ibm3650-dsc2075-sel.txt
> 
> # cat ibm3650-dsc2075-sel.txt
> 1:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00
> 2:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00
> 3:OEM defined = 02 00 00 FF 00 00 00 00 20 00 00 00 00
> 
> Then the problem didn't occur anymore.
> Besides, what is the meaning of this OEM defined? I can't see any log that is 
> more specific, or something like

The system event log is allowed to store OEM defined information.  Since
the information is defined by (in this case) IBM, I have no way to
convert the hex into something like what you're used to :-(

> 220:19-Sep-2008 14:24:56:Power Unit Sys pwr monitor:Power Off/Power Down
> 221:19-Sep-2008 14:25:16:Power Unit Sys pwr monitor:Power Off/Power Down
> 
> I've attached here the ipmi-sel debug output.
> 
> Then one side question, I want to ask the possible reasons of the ff
> log obtained prior to clearing. I didn't change any in the system.
> I just noticed that the system halted serving and went back after 4-5
> minutes, w/out any other records in SEL that says the box hang and
> rebooted.
>
> 54:23-Jan-2009 11:28:55:System Event #0:System Reconfigured

I'm not quite sure what you're asking.  Are you asking why the above log
message occurs?  I'm not too sure.  It could really be for one of many
reasons.  Maybe a BIOS changed for a firmware changed?  The IPMI spec
doesn't really define when a "System Reconfigured" event must be
reported.  It only defines that a "System Reconfigured" event can occur
and that manufacturers are free to determine what events will make that
information output to the event log.

Hope I was helpful,

Al

> Thanks,
> 
> Won
> 
> 
>       
-- 
Albert Chu
address@hidden
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory





reply via email to

[Prev in Thread] Current Thread [Next in Thread]