freeipmi-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fw: [Freeipmi-devel] ibmx3650 reboots after ipmi-sel is unable to ge


From: Won De Erick
Subject: Re: Fw: [Freeipmi-devel] ibmx3650 reboots after ipmi-sel is unable to get SEL record
Date: Tue, 27 Jan 2009 17:43:25 -0800 (PST)

----- Original Message ----

> From: Al Chu <address@hidden>
> 
> Hey Won,
> 
> On Mon, 2009-01-26 at 18:53 -0800, Won De Erick wrote:
> > ----- Original Message ----
> > 
> > > From: Al Chu 
> > > 
> > > Hey Won,
> > > 
> > > On Sun, 2009-01-25 at 23:00 -0800, Won De Erick wrote:
> > > > I am forwarding this to the FreeIPMI users mailing list. Hope, I can 
> > > > get 
> > > > hints from you all.
> > > > Thank you.
> > > > 
> > > > 
> > > > 
> > > > ----- Forwarded Message ----
> > > > From: Won De Erick 
> > > > To: Albert Chu 
> > > > Cc: address@hidden
> > > > Sent: Saturday, January 24, 2009 11:55:24 AM
> > > > Subject: Re: [Freeipmi-devel] ibmx3650 reboots after ipmi-sel is unable 
> > > > to 
> > > > get SEL record
> > > > 
> > > > Pls disregard previous email. I forgot to attach the file. :)
> > > 
> > > Did you send me the wrong debug file?  I see debug output from
> > > ipmi-sensors??
> > > 
> > 
> > I'm sorry, attached is the correct one.
> 
> Seems that this has a successful ipmi-sel execution in it.  So not much
> I can debug with :-(
> 
> > 
> > > > Hi Al,
> > > > 
> > > > With IBM x3650, I  noticed that ipmi-sel is unable to get the SEL 
> > > > record.
> > > > 
> > > > # ipmi-sel --version
> > > > IPMI Sensors [ipmi-sel-0.6.10]
> > > > 
> > > > # ipmi-sel > ibm3650-dsc2075-sel.txt
> > > > ipmi_cmd_get_sel_entry: BMC busy
> > > > ipmi-sel: unable to get SEL record
> > > > 
> > > > After the above, the box automatically rebooted. Is this normal?
> > > 
> > > I have never seen this behavior before, and I wouldn't consider it
> > > "good" in any definition.  This is likely a bug in the IBM
> > > implementation.  The "BMC busy" means exactly what it says, the BMC is
> > > busy and cannot respond to IPMI requests.  It by itself is not a
> > > problem.  For example, some other IPMI tasks are hogging resources.  But
> > > you should presumably be able to reach the card eventually.  Is it
> > > possible you have other IPMI things running in the background?
> > > 
> > 
> > bmc-watchdog (as daemon) was the only thing running in the background.
> 
> This shouldn't be enough to cause enough IPMI to be *that* busy.  Here's
> a thought.  Perhaps the ipmi-sel logs went full, the BMC card went busy,
> and thus the bmc-watchdog couldn't perform IPMI and timed out, thus
> leading to a reboot??  Obviously, it depends on how you setup the
> bmc-watchdog.
> 
this is my setup:
#bmc-watchdog -d -u 4 -p 0 -n -i 300 -l 0

I forgot to tell you that I am using in-band mechanism. IBM x3650 should 
be installed with an RSA II card to get BMC card (think this is the built-in
LAN management port that goes with the box) working.

> > 
> > > > I then cleared the SEL records, thinking that the reboot might have 
> > > > been 
> > > > triggered due to a full SEL.
> > > 
> > > I think this is a reasonable guess.  It could be anything really.  
> > > 
> > > > # ipmi-sel -c
> > > > 
> > > > # reboot
> > > > # ipmi-sel
> > > > 1:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00
> > > > # ipmi-sel
> > > > 1:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00
> > > > 
> > > > # reboot
> > > > # ipmi-sel
> > > > 1:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00
> > > > 2:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00
> > > > 3:OEM defined = 02 00 00 FF 00 00 00 00 20 00 00 00 00
> > > > 
> > > > Then retried the previous command that caused an error.
> > > > 
> > > > # ipmi-sel > ibm3650-dsc2075-sel.txt
> > > > 
> > > > # cat ibm3650-dsc2075-sel.txt
> > > > 1:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00
> > > > 2:OEM defined = 00 00 00 00 00 E3 25 86 80 00 00 FF 00
> > > > 3:OEM defined = 02 00 00 FF 00 00 00 00 20 00 00 00 00
> > > > 
> > > > Then the problem didn't occur anymore.
> > > > Besides, what is the meaning of this OEM defined? I can't see any log 
> > > > that 
> > > > is more specific, or something like
> > > 
> > > The system event log is allowed to store OEM defined information.  Since
> > > the information is defined by (in this case) IBM, I have no way to
> > > convert the hex into something like what you're used to :-(
> > > 
> > 
> > I think this is cool. So, is it safe to assume that the system
> > rebooted if I see similar OEM defined info ( in this case OEM defined
> > = 00 00 00 00 00 E3 25 86 80 00 00 FF 00)? Is there any possibility to
> > integrate IBM's OEM defined info in the future too? :D
> 
> I'd be willing to integrate any vendors OEM defined

This is nice to know. :)

> interpretation/parsing into FreeIPMI. The problem is, I do not know how
> to interpret/parse any of their information :-(  
> 
> As a customer, you should tell your vendor support about this.  Each
> user that complains makes it more possible for them to release the
> information.
> 
> Al
> 
> > > > 220:19-Sep-2008 14:24:56:Power Unit Sys pwr monitor:Power Off/Power Down
> > > > 221:19-Sep-2008 14:25:16:Power Unit Sys pwr monitor:Power Off/Power Down
> > > > 
> > > > I've attached here the ipmi-sel debug output.
> > > > 
> > > > Then one side question, I want to ask the possible reasons of the ff
> > > > log obtained prior to clearing. I didn't change any in the system.
> > > > I just noticed that the system halted serving and went back after 4-5
> > > > minutes, w/out any other records in SEL that says the box hang and
> > > > rebooted.
> > > >
> > > > 54:23-Jan-2009 11:28:55:System Event #0:System Reconfigured
> > > 
> > > I'm not quite sure what you're asking.  Are you asking why the above log
> > > message occurs?  I'm not too sure.  It could really be for one of many
> > > reasons.  Maybe a BIOS changed for a firmware changed?  The IPMI spec
> > > doesn't really define when a "System Reconfigured" event must be
> > > reported.  It only defines that a "System Reconfigured" event can occur
> > > and that manufacturers are free to determine what events will make that
> > > information output to the event log.
> > > 
> > 
> > You exactly got what I should mean. But aside from changes on the BIOS
> > or BMC firmware, I want to know too if there are instances that the
> > event would be reported if there are changes on the OS level. I just
> > wondered why the "System Reconfigured" event log came out, where in
> > fact no changes were made on the BIOS firmware or BMC firmware, or on
> > the OS level. Sorry, this question may not be related to FreeIPMI
> > anymore, but I just want to elicit some ideas from you.
> > 
> > > Hope I was helpful,
> > > 
> > > Al
> > > 
> > > > Thanks,
> > > > 
> > > > Won
> > > > 
> > > > 
> > > >      
> > > -- 
> > > Albert Chu
> > > address@hidden
> > > Computer Scientist
> > > High Performance Systems Division
> > > Lawrence Livermore National Laboratory
> > 
> > I am receiving mail delivery error(s) when sending mails to 
> address@hidden; address@hidden
> > 
> > Thanks for the usual support and help,
> > 
> > Won
> > 
> -- 
> Albert Chu
> address@hidden
> Computer Scientist
> High Performance Systems Division
> Lawrence Livermore National Laboratory



      





reply via email to

[Prev in Thread] Current Thread [Next in Thread]