freeipmi-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fw: [Freeipmi-devel] ibmx3650 reboots after ipmi-sel is unable to ge


From: Won De Erick
Subject: Re: Fw: [Freeipmi-devel] ibmx3650 reboots after ipmi-sel is unable to get SEL record
Date: Mon, 26 Jan 2009 18:53:19 -0800 (PST)

----- Original Message ----

> From: Al Chu <address@hidden>
> 
> Hey Won,
> 
> On Sun, 2009-01-25 at 23:00 -0800, Won De Erick wrote:
> > I am forwarding this to the FreeIPMI users mailing list. Hope, I can get 
> > hints 
> from you all.
> > Thank you.
> > 
> > 
> > 
> > ----- Forwarded Message ----
> > From: Won De Erick 
> > To: Albert Chu 
> > Cc: address@hidden
> > Sent: Saturday, January 24, 2009 11:55:24 AM
> > Subject: Re: [Freeipmi-devel] ibmx3650 reboots after ipmi-sel is unable to 
> > get 
> SEL record
> > 
> > Pls disregard previous email. I forgot to attach the file. :)
> 
> Did you send me the wrong debug file?  I see debug output from
> ipmi-sensors??
> 

I'm sorry, attached is the correct one.

> > Hi Al,
> > 
> > With IBM x3650, I  noticed that ipmi-sel is unable to get the SEL record.
> > 
> > # ipmi-sel --version
> > IPMI Sensors [ipmi-sel-0.6.10]
> > 
> > # ipmi-sel > ibm3650-dsc2075-sel.txt
> > ipmi_cmd_get_sel_entry: BMC busy
> > ipmi-sel: unable to get SEL record
> > 
> > After the above, the box automatically rebooted. Is this normal?
> 
> I have never seen this behavior before, and I wouldn't consider it
> "good" in any definition.  This is likely a bug in the IBM
> implementation.  The "BMC busy" means exactly what it says, the BMC is
> busy and cannot respond to IPMI requests.  It by itself is not a
> problem.  For example, some other IPMI tasks are hogging resources.  But
> you should presumably be able to reach the card eventually.  Is it
> possible you have other IPMI things running in the background?
> 

bmc-watchdog (as daemon) was the only thing running in the background.

> > I then cleared the SEL records, thinking that the reboot might have been 
> triggered due to a full SEL.
> 
> I think this is a reasonable guess.  It could be anything really.  
> 
> > # ipmi-sel -c
> > 
> > # reboot
> > # ipmi-sel
> > 1:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00
> > # ipmi-sel
> > 1:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00
> > 
> > # reboot
> > # ipmi-sel
> > 1:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00
> > 2:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00
> > 3:OEM defined = 02 00 00 FF 00 00 00 00 20 00 00 00 00
> > 
> > Then retried the previous command that caused an error.
> > 
> > # ipmi-sel > ibm3650-dsc2075-sel.txt
> > 
> > # cat ibm3650-dsc2075-sel.txt
> > 1:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00
> > 2:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00
> > 3:OEM defined = 02 00 00 FF 00 00 00 00 20 00 00 00 00
> > 
> > Then the problem didn't occur anymore.
> > Besides, what is the meaning of this OEM defined? I can't see any log that 
> > is 
> > more specific, or something like
> 
> The system event log is allowed to store OEM defined information.  Since
> the information is defined by (in this case) IBM, I have no way to
> convert the hex into something like what you're used to :-(
> 

I think this is cool. So, is it safe to assume that the system rebooted if I 
see similar OEM defined info ( in this case OEM defined = 00 00 00 00 00 E3 25 
86 80 00 00 FF 00)? Is there any possibility to integrate IBM's OEM defined 
info in the future too? :D

> > 220:19-Sep-2008 14:24:56:Power Unit Sys pwr monitor:Power Off/Power Down
> > 221:19-Sep-2008 14:25:16:Power Unit Sys pwr monitor:Power Off/Power Down
> > 
> > I've attached here the ipmi-sel debug output.
> > 
> > Then one side question, I want to ask the possible reasons of the ff
> > log obtained prior to clearing. I didn't change any in the system.
> > I just noticed that the system halted serving and went back after 4-5
> > minutes, w/out any other records in SEL that says the box hang and
> > rebooted.
> >
> > 54:23-Jan-2009 11:28:55:System Event #0:System Reconfigured
> 
> I'm not quite sure what you're asking.  Are you asking why the above log
> message occurs?  I'm not too sure.  It could really be for one of many
> reasons.  Maybe a BIOS changed for a firmware changed?  The IPMI spec
> doesn't really define when a "System Reconfigured" event must be
> reported.  It only defines that a "System Reconfigured" event can occur
> and that manufacturers are free to determine what events will make that
> information output to the event log.
> 

You exactly got what I should mean. But aside from changes on the BIOS or BMC 
firmware, I want to know too if there are instances that the event would be 
reported if there are changes on the OS level. I just wondered why the "System 
Reconfigured" event log came out, where in fact no changes were made on the 
BIOS firmware or BMC firmware, or on the OS level. Sorry, this question may not 
be related to FreeIPMI anymore, but I just want to elicit some ideas from you.

> Hope I was helpful,
> 
> Al
> 
> > Thanks,
> > 
> > Won
> > 
> > 
> >      
> -- 
> Albert Chu
> address@hidden
> Computer Scientist
> High Performance Systems Division
> Lawrence Livermore National Laboratory

I am receiving mail delivery error(s) when sending mails to address@hidden; 
address@hidden

Thanks for the usual support and help,

Won



      





reply via email to

[Prev in Thread] Current Thread [Next in Thread]