freeipmi-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freeipmi-users] Intel SR 1625 Sensors


From: Albert Chu
Subject: Re: [Freeipmi-users] Intel SR 1625 Sensors
Date: Thu, 07 Apr 2011 09:23:53 -0700

Hey Werner,

Cool.  I'll document this in the code as well, and I'll go ahead and
write a SEL interpretation condition for the SMI timeout too.

Al

On Thu, 2011-04-07 at 07:06 -0700, Werner Fischer wrote:
> Hey Al,
> 
> thanks for the beta1. I'll forward this to our customer and let you know
> as soon as I have feedback.
> 
> btw: below are the details I got from Intel regarding the sensors.
> --------------------------------------------------------------------
> 1) SMI Timeout:
> The BMC supports an SMI timeout sensor (sensor type OEM (F3h), event
> type Discrete (03h)) that asserts if the SMI signal has been asserted
> for more than 90 seconds. A continuously asserted SMI signal is an
> indication that the BIOS cannot service the condition that caused the
> SMI. This is usually because that condition prevents the BIOS from
> running. When an SMI timeout occurs, the BMC asserts the SMI timeout
> sensor and logs a SEL event for that sensor. The BMC will also reset the
> system.
> The normal value is deasserted; system health status = OK
> When this sensor is asserted, the system health status = fatal. 
> 
> 2) IOH Therm Trip. This sensor indicates whether the IOH has reached
> overheating point (thermal trip point)
> The normal value is deasserted; system health status = OK
> When this sensor is asserted, the system health status = fatal. 
> 
> Both VRD Hot sensors have a fatal contribution to system health when in
> limit exceeded state.
> --------------------------------------------------------------------
> 
> best regards,
> Werner
> 
> On Wed, 2011-04-06 at 13:59 -0700, Albert Chu wrote:
> > Hey Werner,
> > 
> > I got a beta release that should handle sensor #47.
> > 
> > http://download.gluster.com/pub/freeipmi/qa-release/freeipmi-1.0.4.beta1.tar.gz
> > 
> > Al
> > 
> > On Wed, 2011-04-06 at 10:00 -0700, Albert Chu wrote:
> > > Hey Werner,
> > > 
> > > On Tue, 2011-04-05 at 23:47 -0700, Werner Fischer wrote:
> > > > Hi Al,
> > > > 
> > > > thank you for the beta.
> > > > 
> > > > Sensors 55, 56, and 59 are now recognized:
> > > > 
> > > > ID | Name             | Type                     | State    | Reading   
> > > >  | Units | Event
> > > > [...]
> > > > 47 | SMI Timeout      | OEM Reserved             | N/A      | N/A       
> > > >  | N/A   | 'OK'
> > > > [...]
> > > > 55 | P1 VRD Hot       | Temperature              | Nominal  | N/A       
> > > >  | N/A   | 'OK'
> > > > 56 | P2 VRD Hot       | Temperature              | Nominal  | N/A       
> > > >  | N/A   | 'OK'
> > > > [...]
> > > > 59 | IOH Therm Trip   | Temperature              | Nominal  | N/A       
> > > >  | N/A   | 'OK'
> > > > 
> > > > For sensor 47 the state is still "N/A".
> > > > 
> > > > For the SMI timeout I assume that the unasserted state is the one which
> > > > should be nominal as I have found a notice on a similar Intel
> > > > motherboard: There Intel they corrected an issue when SMI Timeout was
> > > > asserted, causing a critical event in their event log - see page 19 in
> > > > this pdf, point "5) Event Log may report SMI Timeout Assertion after
> > > > Server Power button is pressed"
> > > > http://download.intel.com/support/motherboards/server/mfsys25/sb/mfsys25_mfsys35_spec_update_feb11.pdf
> > > 
> > > Ahh, I completely misread sensor 47.  I thought it was an OEM event
> > > sensor, but it's not.  It has a normal event, the sensor type is the
> > > only thing that is OEM.  Assuming your guess about assert vs. unassert
> > > is correct (it's a reasonable guess to me), I can add this OEM support
> > > into FreeIPMI.  I'll try and get you a beta sometime later today.
> > > 
> > > Al
> > > 
> > > > But I will ask Intel on more details on sensor 47 and sensor 59 as you
> > > > have requested to be sure. I'll let you know on the list once I have
> > > > more details on that.
> > > > 
> > > > Best regards,
> > > > Werner
> > > > 
> > > > 
> > > > PS: here is some more verbose output on these four sensors:
> > > > 
> > > > Record ID: 47
> > > > ID String: SMI Timeout
> > > > Sensor Type: OEM Reserved (F3h)
> > > > Sensor Number: 6
> > > > IPMB Slave Address: 10h
> > > > Sensor Owner ID: 20h
> > > > Sensor Owner LUN: 0h
> > > > Channel Number: 0h
> > > > Entity ID: system board (7)
> > > > Entity Instance: 1
> > > > Entity Instance Type: Physical Entity
> > > > Event/Reading Type Code: 3h
> > > > Sensor State: N/A
> > > > Sensor Event: 'OK'
> > > > 
> > > > Record ID: 55
> > > > ID String: P1 VRD Hot
> > > > Sensor Type: Temperature (1h)
> > > > Sensor Number: 102
> > > > IPMB Slave Address: 10h
> > > > Sensor Owner ID: 20h
> > > > Sensor Owner LUN: 0h
> > > > Channel Number: 0h
> > > > Entity ID: processor (3)
> > > > Entity Instance: 1
> > > > Entity Instance Type: Physical Entity
> > > > Event/Reading Type Code: 5h
> > > > Sensor State: Nominal
> > > > Sensor Event: 'OK'
> > > > 
> > > > Record ID: 56
> > > > ID String: P2 VRD Hot
> > > > Sensor Type: Temperature (1h)
> > > > Sensor Number: 103
> > > > IPMB Slave Address: 10h
> > > > Sensor Owner ID: 20h
> > > > Sensor Owner LUN: 0h
> > > > Channel Number: 0h
> > > > Entity ID: processor (3)
> > > > Entity Instance: 2
> > > > Entity Instance Type: Physical Entity
> > > > Event/Reading Type Code: 5h
> > > > Sensor State: Nominal
> > > > Sensor Event: 'OK'
> > > > 
> > > > Record ID: 59
> > > > ID String: IOH Therm Trip
> > > > Sensor Type: Temperature (1h)
> > > > Sensor Number: 106
> > > > IPMB Slave Address: 10h
> > > > Sensor Owner ID: 20h
> > > > Sensor Owner LUN: 0h
> > > > Channel Number: 0h
> > > > Entity ID: system board (7)
> > > > Entity Instance: 1
> > > > Entity Instance Type: Physical Entity
> > > > Event/Reading Type Code: 3h
> > > > Sensor State: Nominal
> > > > Sensor Event: 'OK'
> > > > 
> > > > On Fri, 2011-04-01 at 15:32 -0700, Albert Chu wrote:
> > > > > Hey Werner, Ben,
> > > > > 
> > > > > Here's a beta that should support those sensor interpretations.  It's
> > > > > tough for me to test w/o your motherboard in front of me, PLMK if it
> > > > > works for you.
> > > > > 
> > > > > http://download.gluster.com/pub/freeipmi/qa-release/freeipmi-1.0.4.beta0.tar.gz
> > > > > 
> > > > > Al
> > > > > 
> > > > > On Fri, 2011-04-01 at 03:51 -0700, Werner Fischer wrote:
> > > > > > Hi Al,
> > > > > > (sorry for sending it twice, I sent my first email in error only to 
> > > > > > you, not the list)
> > > > > > 
> > > > > > I've been on vacation for some weeks and now back again.
> > > > > > 
> > > > > > Benjamin meant with "not detected" that FreeIPMI returns a 
> > > > > > monitoring
> > > > > > status of "N/A" for those sensors (not "Nominal"). Unfortunately we
> > > > > > missed to send the output of "ipmimonitoring --legacy-output
> > > > > > --interpret-oem-data --quiet-cache --sdr-cache-recreate" (which is 
> > > > > > used
> > > > > > by our Nagios plugin):
> > > > > > 
> > > > > > Record ID | Sensor Name | Sensor Group | Monitoring Status | Sensor 
> > > > > > Units | Sensor Reading [...]
> > > > > > 47 | SMI Timeout | OEM Reserved | N/A | N/A | 'OK'
> > > > > > [...]
> > > > > > 55 | P1 VRD Hot | Temperature | N/A | N/A | 'OK'
> > > > > > 56 | P2 VRD Hot | Temperature | N/A | N/A | 'OK'
> > > > > > [...]
> > > > > > 59 | IOH Therm Trip | Temperature | N/A | N/A | 'OK'
> > > > > > 
> > > > > > Would it be possible for you to include information about those four
> > > > > > sensors to future versions of FreeIPMI, so that it reports a 
> > > > > > monitoring
> > > > > > status of "Nominal" when the sensor reading is 'OK' as above?
> > > > > > 
> > > > > > In case you would need additional information from Intel about those
> > > > > > sensors, just let me know.
> > > > > > 
> > > > > > Best regards and have a nice weekend,
> > > > > > thank you,
> > > > > > Werner
> > > > > > 
> > > > > > On Wed, 2011-02-23 at 10:06 -0800, Albert Chu wrote:
> > > > > > > Hi Benjamin,
> > > > > > > 
> > > > > > > What do you mean by "not detected"?  It appears everything is 
> > > > > > > fine by
> > > > > > > the information you list below.
> > > > > > > 
> > > > > > > Do you mean these sensors are not reporting actual temperatures?  
> > > > > > > While
> > > > > > > these are indeed temperature sensors (identified by the 
> > > > > > > motherboard as
> > > > > > > such), they do not appear to be sensors that report a temperature
> > > > > > > reading.  They instead report an event bitmask.  The key is the
> > > > > > > "event/Readin Type Code" field of each sensor.
> > > > > > > 
> > > > > > > Al
> > > > > > > 
> > > > > > > On Tue, 2011-02-22 at 23:55 -0800, Benjamin Bayer wrote:
> > > > > > > > Hello,
> > > > > > > > we have a Intel SR1625 wehre some Sensors not detected with 
> > > > > > > > FreeIPMI Version 1.0.2.beta3.
> > > > > > > >  
> > > > > > > > Thank You.
> > > > > > > > 
> > > > > > > > Regards
> > > > > > > > 
> > > > > > > > Benjamin Bayer
> > > > > > 
> > > > 
> > > > 
> 
> 
-- 
Albert Chu
address@hidden
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory




reply via email to

[Prev in Thread] Current Thread [Next in Thread]