freeipmi-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freeipmi-devel] can't read tempeature info


From: Albert Chu
Subject: Re: [Freeipmi-devel] can't read tempeature info
Date: Thu, 25 Aug 2011 12:02:27 -0700

Hi Franco,

I've just put up this beta.

http://download.gluster.com/pub/freeipmi/qa-release/freeipmi-1.0.6.beta1.tar.gz

I added a workaround called 'ignorescanningdisabled', which you can use
via "-W ignorescanningdisabled'.

So you don't have to type it on the command line all the time, it can
also be configured in /etc/freeipmi.conf.

PLMK if it works for you.

Al

On Thu, 2011-08-25 at 09:55 -0700, Albert Chu wrote:
> Hi Franco,
> 
> On Thu, 2011-08-25 at 02:26 -0700, Franco Brasolin wrote:
> > Hi Albert,
> >
> > Il 24/08/2011 19:42, Albert Chu ha scritto:
> > > Hi Franco,
> > >
> > > Well, the answer to your problem is surprisingly simple.  Your
> > > motherboard is reporting that the sensors are disabled, thus
> > > ipmi-sensors doesn't output anything as a result.  One example:
> > >
> > > pc-xyz: IPMI Command Data:
> > > pc-xyz: ------------------
> > > pc-xyz: [              2Dh] = cmd[ 8b]
> > > pc-xyz: [               0h] = comp_code[ 8b]
> > > pc-xyz: [              42h] = sensor_reading[ 8b]
> > > pc-xyz: [               0h] = reserved1[ 5b]
> > > pc-xyz: [               0h] = reading_state[ 1b]
> > > pc-xyz: [               0h] = sensor_scanning[ 1b]
> > > pc-xyz: [               0h] = all_event_messages[ 1b]
> > > pc-xyz: [              C0h] = sensor_event_bitmask1[ 8b]
> > > pc-xyz: IPMI Trailer:
> > > pc-xyz: --------------
> > > pc-xyz: [              49h] = checksum2[ 8b]
> > > Sensor reading/event bitmask not available: sensor scanning disabled
> > > 2   | Temp             | Temperature              | N/A        | C     |
> > > N/A
> > >
> > > So the question is why is your ipmitool working (or atleast appears to
> > > be working)?  Because the ipmitool I see from sourceforge checks for
> > > this bit and will not output if it is detected.
> > >
> > >          } else if (!(rsp->data[1]&  SCANNING_DISABLED)) {
> > >                  validread = 0;
> > >
> > > Is it possible your ipmitool is a version supplied by Dell, and Dell has
> > > hacked it to not check for this to get around this hardware issue?
> > >
> > > On the Dell Poweredge node I have access to, via freeipmi I get:
> > >
> > > # ipmi-sensors
> > > ID | Name         | Type        | Reading    | Units | Event
> > > 1  | Temp         | Temperature | N/A        | C     | N/A
> > > 2  | Temp         | Temperature | N/A        | C     | N/A
> > > 3  | Temp         | Temperature | N/A        | C     | N/A
> > > 4  | Temp         | Temperature | N/A        | C     | N/A
> > > 5  | Ambient Temp | Temperature | 17.00      | C     | 'OK'
> > > <snip>
> > >
> > > and via ipmitool
> > >
> > > # ipmitool -I free sensor list
> > > Temp             | na         | degrees C  | na    | na        | na       
> > >  | na        | 85.000    | 90.000    | na
> > > Temp             | na         | degrees C  | na    | na        | na       
> > >  | na        | 85.000    | 90.000    | na
> > > Temp             | na         | degrees C  | na    | na        | na       
> > >  | na        | na        | na        | na
> > > Temp             | na         | degrees C  | na    | na        | na       
> > >  | na        | na        | na        | na
> > > Ambient Temp     | 17.000     | degrees C  | ok    | na        | 3.000    
> > >  | 8.000     | 42.000    | 47.000    | na
> > > <snip>
> > >
> > > So the exact same output on my system.  Lets hack them both to NOT check
> > > for the sensor-scanning disabled bit.
> > >
> > > (turning on --entity-sensor-names)
> > >
> > > ID | Name                             | Type                     | 
> > > Reading    | Units | Event
> > > 1  | Processor 1 Temp                 | Temperature              | -69.00 
> > >     | C     | 'OK'
> > > 2  | Processor 2 Temp                 | Temperature              | -65.00 
> > >     | C     | 'OK'
> > > 3  | Power Supply 1 Temp              | Temperature              | 40.00  
> > >     | C     | 'OK'
> > > 4  | Power Supply 2 Temp              | Temperature              | 40.00  
> > >     | C     | 'OK'
> > > 5  | System Board Ambient Temp        | Temperature              | 17.00  
> > >     | C     | 'OK'
> > >
> > > Temp             | -69.000    | degrees C  | ok    | na        | na       
> > >  | na        | 85.000    | 90.000    | na
> > > Temp             | -65.000    | degrees C  | ok    | na        | na       
> > >  | na        | 85.000    | 90.000    | na
> > > Temp             | 40.000     | degrees C  | ok    | na        | na       
> > >  | na        | na        | na        | na
> > > Temp             | 40.000     | degrees C  | ok    | na        | na       
> > >  | na        | na        | na        | na
> > > Ambient Temp     | 17.000     | degrees C  | ok    | na        | 3.000    
> > >  | 8.000     | 42.000    | 47.000    | na
> > >
> > > Aha!  It seems that your version of ipmitool has been altered to act
> > > differently than the normal ipmitool.
> >
> > yes, you're right
> > ipmitool was already installed (vers. 1.8.8 ), and I don't know how it
> > was installed and/or if it's a special modified version.
> > Now I have just installed the latest version 1.8.11 and the behaviour is
> > changed, ie: only "Ambient temp" is displayed.
> 
> Ahh.  I did some checking, and it seems the SCANNING_DISABLED fix was
> added post-ipmitool-1.8.8.  It seems your version was just old (ipmitool
> 1.8.8 was released mid-2006).
> 
> >
> > > I can easily put in a workaround flag (e.g. -W ignorescanningbit or
> > > something) to deal with this for Dell motherboards.  However, before
> > > doing that there is an additional question to be answered.  Are the
> > > temperatures of -69, -65, 40&  40 correct or incorrect??  At the
> > > minimum, the temperatures of -69&  -65 seem highly incorrect for
> > > processor temperatures (but as Andy said earlier, it's possible they are
> > > margin sensors, but nothing indicates that).
> >
> > I think Andy is right, please have a look to this page:
> >
> > http://comments.gmane.org/gmane.linux.hardware.dell.poweredge/25491
> >
> > and
> >
> > http://www.intel.com/design/xeon/datashts/313355.htm  section 6.3.1.1
> >
> 
> Ahhh, interesting.  I have a different motherboard w/ margin temperature
> sensors, but they report thresholds of about +5 degrees (e.g. it's gone
> "too positive").  In contrast, it seems Dell gives (what I assume to be
> the) normal threshold of 85/90C.
> 
> > > I am wondering, does Dell supply any software that you might be able to
> > > try out to see what "Dell approved" readings are?  That way we know what
> > > should be done.
> >
> > there's OMSA (OpenManage Server Administrator) available
> >   at the DELL site, but I didn't try it
> >
> >
> > So now: is it right to modify the code to don't test disabled sensors,
> > as you wrote before:
> >
> > ========================================================================
> >  > So the question is why is your ipmitool working (or atleast appears to
> >  > be working)?  Because the ipmitool I see from sourceforge checks for
> >  > this bit and will not output if it is detected.
> >  >
> >  >          } else if (!(rsp->data[1]&  SCANNING_DISABLED)) {
> >  >                  validread = 0;
> > ========================================================================
> >
> > or not ???
> > and if yes, will this modification be available in one of next freeipmi
> > release or do I have to do in my own  version ?
> 
> Now it's clear to me that these sensors are "for real".  The fact that
> Dell's motherboard reports "scanning disabled" is a bug in their
> firmware.  I need to add a workaround to deal with the scanning disabled
> bit.  I'll try to get your a beta tar.gz sometime today (or perhaps
> it'll be morning for you when you get into work :P).
> 
> Al
> 
> >
> > thank you!
> >
> > ciao
> > Franco
> >
> >
> >
> >
> > >
> > > Al
> > >
> > >
> > > On Wed, 2011-08-24 at 01:35 -0700, Franco Brasolin wrote:
> > >> thank you Albert for your quick response,
> > >> below all the answers.
> > >> ciao
> > >> Franco
> > >>
> > >> Il 23/08/2011 19:00, Albert Chu ha scritto:
> > >>> Hi Franco,
> > >>>
> > >>> The fact that you're getting some negative degrees in ipmitool means
> > >>> something is probably wrong with the IPMI firmware on your mobo.
> > >>> Something is definitely not right.
> > >>>
> > >>> The first thing to try is to run ipmi-sensors w/ --bridge-sensors.  It's
> > >>> possible the sensors aren't on the main IPMI bus, so they need to be
> > >>> bridged to other devices on the motherboard.
> > >>
> > >> ipmi-sensor -h pc-xyz  -u user -p passw --bridge-sensors doesn't help
> > >>>
> > >>> As a second shot, this seems similar to a bug I saw on a HP machine.
> > >>> Could you try running with the "-W discretereading" flag w/
> > >>> ipmi-sensors.  Maybe that will fix the problem.  It would also be
> > >>> interesting to compare FreeIPMI's ipmi-sensors output to ipmitool's
> > >>> 'sensor list' output (ipmitool's assumptions are different in that code
> > >>> path).
> > >>
> > >> ipmi-sensor -h pc-xyz  -u user -p passw --W discretereading
> > >> doesn't help too
> > >>
> > >>
> > >> In attachment the ipmi-sensor&  ipmitool sensor list output.
> > >>
> > >>
> > >>> If that doesn't help, could you send me the --debug output from
> > >>> ipmi-sensors.  I'd have to look into detail what is actually going on on
> > >>> this motherboard.
> > >>
> > >> In attachment also the debug output
> > >>>
> > >>> As a side note, you may be interested in the --entity-sensor-names
> > >>> option for ipmi-sensors.  It may make your output better for your
> > >>> motherboard.
> > >>>
> > >>> Al
> > >>>
> > >>> On Tue, 2011-08-23 at 06:47 -0700, Franco Brasolin wrote:
> > >>>> Hi all,
> > >>>> I need some help to read Temperature sensors on a Dell  PowerEdge R410
> > >>>> model name      : Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz
> > >>>> # uname -a
> > >>>> Linux pc-xyz 2.6.18-238.1.1.el5 #1 SMP Wed Jan 19 11:06:36 CET 2011
> > >>>> x86_64 x86_64 x86_64 GNU/Linux
> > >>>>
> > >>>> If I try (from another host with freeipmi 1.06.beta0 installed) the
> > >>>> following command:
> > >>>>
> > >>>> # ipmi-sensors -V
> > >>>> ipmi-sensors - 1.0.6.beta0
> > >>>>
> > >>>> # ipmi-sensors -h pc-xyz  -u user -p passw   | grep -i temp
> > >>>> 1   | Temp             | Temperature              | N/A        | C     
> > >>>> | N/A
> > >>>> 2   | Temp             | Temperature              | N/A        | C     
> > >>>> | N/A
> > >>>> 3   | Temp             | Temperature              | N/A        | C     
> > >>>> | N/A
> > >>>> 4   | Temp             | Temperature              | N/A        | C     
> > >>>> | N/A
> > >>>> 5   | Ambient Temp     | Temperature              | N/A        | C     
> > >>>> | N/A
> > >>>> 6   | Ambient Temp     | Temperature              | N/A        | C     
> > >>>> | N/A
> > >>>> 7   | Temp             | Temperature              | N/A        | C     
> > >>>> | N/A
> > >>>> 8   | Temp             | Temperature              | N/A        | C     
> > >>>> | N/A
> > >>>> 9   | Temp             | Temperature              | N/A        | C     
> > >>>> | N/A
> > >>>> 10  | Ambient Temp     | Temperature              | 21.00      | C     
> > >>>> |
> > >>>> 'OK'
> > >>>> 11  | Planar Temp      | Temperature              | N/A        | C     
> > >>>> | N/A
> > >>>> 65  | CPU Temp Interf  | Temperature              | N/A        | N/A   
> > >>>> | N/A
> > >>>> 110 | Mem Overtemp     | Memory                   | N/A        | N/A   
> > >>>> | N/A
> > >>>>
> > >>>> ie: only ambient temperature is available, while if I use ipmitool:
> > >>>>
> > >>>>
> > >>>> #  ipmitool -H pc-xyz -U user -P passw  sdr type temperature
> > >>>> Temp             | 01h | ok  |  3.1 | -57 degrees C
> > >>>> Temp             | 02h | ok  |  3.2 | -63 degrees C
> > >>>> Temp             | 05h | ok  | 10.1 | 19 degrees C
> > >>>> Ambient Temp     | 07h | ok  | 10.1 | 24 degrees C
> > >>>> Temp             | 06h | ok  | 10.2 | 30 degrees C
> > >>>> Ambient Temp     | 08h | ok  | 10.2 | 27 degrees C
> > >>>> Ambient Temp     | 0Eh | ok  |  7.1 | 18 degrees C
> > >>>> Planar Temp      | 0Fh | ok  |  7.1 | 35 degrees C
> > >>>> IOH THERMTRIP    | 5Dh | ns  |  7.1 | Disabled
> > >>>> CPU Temp Interf  | 76h | ns  |  7.1 | Disabled
> > >>>> Temp             | 0Ah | ok  |  8.1 | 26 degrees C
> > >>>> Temp             | 0Bh | ok  |  8.1 | 23 degrees C
> > >>>> Temp             | 0Ch | unc |  8.1 | 44 degrees C
> > >>>>
> > >>>> I obtain much more info.
> > >>>> What am I doing wrong ??
> > >>>>
> > >>>> thank you very much for your help!
> > >>>> ciao
> > >>>> Franco
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> _______________________________________________
> > >>>> Freeipmi-devel mailing list
> > >>>> address@hidden
> > >>>> https://lists.gnu.org/mailman/listinfo/freeipmi-devel
> > >>
> >
> --
> Albert Chu
> address@hidden
> Computer Scientist
> High Performance Systems Division
> Lawrence Livermore National Laboratory
> 
> 
> 
> _______________________________________________
> Freeipmi-devel mailing list
> address@hidden
> https://lists.gnu.org/mailman/listinfo/freeipmi-devel
-- 
Albert Chu
address@hidden
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory





reply via email to

[Prev in Thread] Current Thread [Next in Thread]