freeipmi-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freeipmi-devel] Trouble w/ HP ProLiant and FreeIPMI (ipmi-s


From: Al Chu
Subject: Re: [Freeipmi-devel] Trouble w/ HP ProLiant and FreeIPMI (ipmi-sensors)
Date: Wed, 10 Oct 2007 09:30:17 -0700

As an added note to other developers, I've added a few extra notes about
the -v and -vv options in the HEAD ipmi-sensors manpage now too.

Al

On Wed, 2007-10-10 at 09:26 -0700, Al Chu wrote:
> Hey Gregor,
> 
> There is a sublety here that I added extra documentation for in the
> FreeIPMI 0.5.0 manpage (I didn't backport to 0.4.X b/c didn't think it
> was that important, but maybe I should have).  The ipmi-sensors numbers
> listed on the left are "record ids", not sensor numbers.  If you use the
> verbose options on ipmi-sensors (-v or -vv), you can find the sensor
> numbers.  As an example on my system:
> 
> Record ID: 22
> Sensor Name: Fan5
> Group Name: Fan
> Sensor Number: 18
> Event/Reading Type Code: 1h
> 
> you can see the sensor number and record id don't match up.  
> 
> I'm not 100% why record ids were chosen for input/output over sensor
> numbers in ipmi-sensors (the tool was originally created by others), but
> if I had to guess for some reasons why:
> 
> - some sensors don't have sensor numbers.  I notice multiple sensors w/
> sensor number 0x00 in the ipmitool output below.  I would guess those
> sensors don't have a number so they just output 0x00.
> 
> - record ids increase in value, while sensor numbers need not, so
> outputting record ids looks nicer, maybe? The output order in ipmitool
> also seems to be record id based, but they just output the sensor number
> instead of the record id.
> 
> As an FYI if you were wondering why sensors seem to be missing from
> ipmi-sensors, our default output does not output every sensor by
> default.  Some are only retrievable via the verbose options.
> 
> Hope that helps clarify things.
> 
> Al
> 
> On Wed, 2007-10-10 at 11:06 +0200, Gregor Dschung wrote:
> > Hey Al,
> > 
> > mmmh.... now, I'm really confused. I thought, the sensor-id has to be 8
> > bit long?
> > 
> > Also I'm confused about the different sensor-ids I'm getting with
> > ipmi-sensors (0.4.6.beta2) and `ipmitool sdr elist` (1.8.6). Sure,
> > ipmitool is giving me the sensor id as Hex and ipmi-sensors as a decimal
> > number... but the converted value should be the same?
> > I would like to set up a PEF-Table, but for that, I'll need the right
> > sensor-ids :-/
> > 
> > Example 1:
> > 
> > p300slg01:/usr/local/src # ipmitool -H gtseval-ipmi -U ADMIN -a sdr
> > elist all
> > Password:
> > Hewlett-Packard  | 00h | ok  |  0.0 | Dynamic MC @ 20h
> > ACPI State       | 20h | ok  |  0.0 | S0/G0: working
> > System Reset     | 21h | ok  |  0.0 |
> > POST Error       | 01h | ns  |  0.0 | Disabled
> > Memory ECC       | 02h | ns  |  0.0 | Disabled
> > PCI Error        | 03h | ns  |  0.0 | Disabled
> > Fan Error        | 04h | ns  |  0.0 | Disabled
> > Watchdog         | FEh | ns  |  0.0 | Disabled
> > CPU Fan 1        | 31h | ok  |  0.0 | 9592.33 RPM
> > CPU Fan 2        | 32h | ok  |  0.0 | 10426.44 RPM
> > CPU Fan 3        | 33h | ok  |  0.0 | 9992.01 RPM
> > CPU Fan 4        | 34h | ok  |  0.0 | 10900.37 RPM
> > CPU Fan 5        | 35h | ok  |  0.0 | 9592.33 RPM
> > CPU Fan 6        | 3Ch | ok  |  0.0 | 10900.37 RPM
> > CPU Fan 7        | 3Dh | ok  |  0.0 | 9992.01 RPM
> > CPU Fan 8        | 3Eh | ok  |  0.0 | 10426.44 RPM
> > CPU Fan 9        | 3Fh | ok  |  0.0 | 9592.33 RPM
> > CPU Fan 10       | 40h | ok  |  0.0 | 10426.44 RPM
> > System Fan 1     | 41h | ok  |  0.0 | 9992.01 RPM
> > System Fan 2     | 42h | ok  |  0.0 | 10900.37 RPM
> > CPU0 Vcore       | 3Ah | ok  |  3.0 | 1.10 Volts
> > CPU1 Vcore       | 3Bh | ns  |  3.1 | No Reading
> > Standby 5V       | 37h | ok  |  0.0 | 4.97 Volts
> > System 5V        | 36h | ok  |  0.0 | 4.85 Volts
> > System 3.3V      | 38h | ok  |  0.0 | 3.23 Volts
> > 3V CMOS Sense    | 39h | ok  |  0.0 | 3.03 Volts
> > CPU0 Therm Diode | 43h | ns  |  3.0 | Disabled
> > CPU1 Therm Diode | 44h | ns  |  3.1 | Disabled
> > CPU0 ThermDiode2 | 52h | ns  |  3.0 | Disabled
> > CPU1 ThermDiode2 | 53h | ns  |  3.1 | Disabled
> > AMB Temp         | 48h | ok  |  0.0 | 29 degrees C
> > MultiBit ECC ER  | 4Ah | ok  |  0.0 | State Deasserted
> > VDD Power Fail   | 4Ch | ok  |  0.0 | State Deasserted
> > Reset            | 4Dh | ok  |  0.0 | State Deasserted
> > Identify         | 4Eh | ok  |  0.0 | State Deasserted
> > NMI              | 50h | ok  |  0.0 | State Deasserted
> > CPU0 Therm-Trip  | 55h | ok  |  3.0 | State Deasserted
> > CPU1 Therm-Trip  | 56h | ns  |  3.1 | No Reading
> > CPU0 IERR        | 57h | ok  |  3.0 | State Deasserted
> > CPU1 IERR        | 58h | ns  |  3.1 | No Reading
> > CPU0 Prochot     | 59h | ok  |  3.0 | Limit Not Exceeded
> > CPU1 Prochot     | 5Ah | ns  |  3.1 | No Reading
> > CPU0 SocketOcc   | 5Bh | ok  |  3.0 | Device Present
> > CPU1 SocketOcc   | 5Ch | ok  |  3.1 | Device Absent
> > CPU0 Dmn 0 Temp  | 86h | ok  |  3.0 | 45 degrees C
> > CPU1 Dmn 0 Temp  | 89h | ns  |  3.1 | No Reading
> > CPU0 Dmn 1 Temp  | 8Ch | ok  |  3.0 | 45 degrees C
> > CPU1 Dmn 1 Temp  | 8Fh | ns  |  3.1 | No Reading
> > FRU0             | 00h | ns  |  0.0 | Logical FRU @00h
> > ----------
> > p300slg01:/usr/local/src # ipmi-sensors -h gtseval-ipmi -u ADMIN -P
> > Password:
> > 64: ACPI State (ACPI Power State): [S0/G0 "working"]
> > 112: System Reset (Module/Board): [OK]
> > 160: POST Error (System Firmware): [Unknown]
> > 208: Memory ECC (Memory): [Unknown]
> > 256: PCI Error (Critical Interrupt): [Unknown]
> > 304: Fan Error (Cooling Device): [Unknown]
> > 352: Watchdog (Watchdog 2): [Unknown]
> > 400: CPU Fan 1 (Fan): 9992.01 RPM (NA/3475.48): [OK]
> > 464: CPU Fan 2 (Fan): 10426.44 RPM (NA/3475.48): [OK]
> > 528: CPU Fan 3 (Fan): 9992.01 RPM (NA/3475.48): [OK]
> > 592: CPU Fan 4 (Fan): 10900.37 RPM (NA/3475.48): [OK]
> > 656: CPU Fan 5 (Fan): 9592.33 RPM (NA/3475.48): [OK]
> > 720: CPU Fan 6 (Fan): 10900.37 RPM (NA/3475.48): [OK]
> > 784: CPU Fan 7 (Fan): 10426.44 RPM (NA/3475.48): [OK]
> > 848: CPU Fan 8 (Fan): 10426.44 RPM (NA/3475.48): [OK]
> > 912: CPU Fan 9 (Fan): 9992.01 RPM (NA/3475.48): [OK]
> > 976: CPU Fan 10 (Fan): 10426.44 RPM (NA/3475.48): [OK]
> > 1040: System Fan 1 (Fan): 9992.01 RPM (NA/3475.48): [OK]
> > 1104: System Fan 2 (Fan): 10900.37 RPM (NA/3475.48): [OK]
> > 1168: CPU0 Vcore (Voltage): 1.10 V (0.40/1.70): [OK]
> > 1232: CPU1 Vcore (Voltage): 0.80 V (0.40/1.70): [OK]
> > 1296: Standby 5V (Voltage): 4.97 V (4.26/5.79): [OK]
> > 1360: System 5V (Voltage): 4.85 V (4.26/5.79): [OK]
> > 1424: System 3.3V (Voltage): 3.23 V (2.82/3.85): [OK]
> > 1488: 3V CMOS Sense (Voltage): 3.03 V (2.62/NA): [OK]
> > 1680: CPU0 Therm Diode (Temperature): 42.00 C (10.00/80.00): [OK]
> > 1744: CPU1 Therm Diode (Temperature): 42.00 C (10.00/80.00): [OK]
> > 1808: CPU0 ThermDiode2 (Temperature): 42.00 C (10.00/80.00): [OK]
> > 1872: CPU1 ThermDiode2 (Temperature): 42.00 C (10.00/80.00): [OK]
> > 1936: AMB Temp (Temperature): 29.00 C (10.00/50.00): [OK]
> > 2064: MultiBit ECC ER (Module/Board): [State Deasserted]
> > 2112: VDD Power Fail (Power Supply): [State Deasserted]
> > 2160: Reset (Button): [State Deasserted]
> > 2208: Identify (Button): [State Deasserted]
> > 2304: NMI (Button): [State Deasserted]
> > 2352: CPU0 Therm-Trip (Processor): [State Deasserted]
> > 2400: CPU1 Therm-Trip (Processor): [State Deasserted]
> > 2448: CPU0 IERR (Processor): [State Deasserted]
> > 2496: CPU1 IERR (Processor): [State Deasserted]
> > 2544: CPU0 Prochot (Temperature): [Limit Not Exceeded]
> > 2592: CPU1 Prochot (Temperature): [Limit Not Exceeded]
> > 2640: CPU0 SocketOcc (Processor): [Device Inserted/Device Present]
> > 2688: CPU1 SocketOcc (Processor): [Device Removed/Device Absent]
> > 2736: CPU0 Dmn 0 Temp (Temperature): 45.00 C (NA/85.00): [OK]
> > 2864: CPU1 Dmn 0 Temp (Temperature): 45.00 C (NA/85.00): [OK]
> > 3248: CPU0 Dmn 1 Temp (Temperature): 45.00 C (NA/85.00): [OK]
> > 3440: CPU1 Dmn 1 Temp (Temperature): 45.00 C (NA/85.00): [OK]
> >
> > Example 2:
> > p300slg01:/usr/local/src # ipmitool -H gts00-ipmi -U ADMIN -a sdr elist all
> > Password:
> > pef              | FDh | ns  | 46.1 | Event-Only
> > watchdog         | FEh | ns  | 46.1 | Event-Only
> > KIM BMC          | 00h | ok  |  0.0 | Dynamic MC @ 20h
> > PLTFRM SECURITY  | FCh | ns  |  0.0 | Event-Only
> > CPU Temp 1       | 00h | ok  |  3.0 | 22 degrees C
> > CPU Temp 2       | 01h | ok  |  3.0 | 21 degrees C
> > CPU Temp 3       | 02h | ns  |  3.1 | No Reading
> > CPU Temp 4       | 03h | ns  |  3.1 | No Reading
> > Sys Temp         | 04h | ok  |  7.0 | 36 degrees C
> > CPU1 Vcore       | 05h | ok  |  3.0 | 1.19 Volts
> > CPU2 Vcore       | 06h | ok  |  3.1 | 1.21 Volts
> > 3.3V             | 07h | ok  |  7.0 | 3.34 Volts
> > 5V               | 08h | ok  |  7.0 | 4.99 Volts
> > 12V              | 09h | ok  |  7.0 | 11.52 Volts
> > -12V             | 0Ah | ok  |  7.0 | -12.30 Volts
> > 1.5V             | 0Bh | ok  |  7.0 | 1.47 Volts
> > 5VSB             | 0Ch | ok  |  7.0 | 4.92 Volts
> > VBAT             | 0Dh | ok  |  7.0 | 3.31 Volts
> > Fan1             | 0Eh | ok  |  7.0 | 4400 RPM
> > Fan2             | 0Fh | lnr |  7.0 | 0 RPM
> > Fan3             | 10h | ok  |  7.0 | 4400 RPM
> > Fan4             | 11h | lnr |  7.0 | 0 RPM
> > Fan5             | 12h | lnr |  7.0 | 0 RPM
> > Fan6             | 13h | lnr |  7.0 | 0 RPM
> > Fan7/CPU1        | 14h | lnr |  3.0 | 0 RPM
> > Fan8/CPU2        | 15h | lnr |  3.0 | 0 RPM
> > Intrusion        | 44h | lnc | 23.1 | 0 unspecified
> > Power Supply     | 16h | ok  | 10.0 | 0 unspecified
> > CPU0 Internal E  | 17h | ok  |  3.0 | 0 unspecified
> > CPU1 Internal E  | 18h | ok  |  3.1 | 0 unspecified
> > CPU Overheat     | 19h | ok  |  3.0 | 0 unspecified
> > Thermal Trip0    | 1Ah | ok  |  3.0 | 0 unspecified
> > Thermal Trip1    | 1Bh | ok  |  3.1 | 0 unspecified
> > BIOS             | 00h | ok  |  0.0 |
> > --------
> > p300slg01:/usr/local/src # ipmi-sensors -h gts00-ipmi -u ADMIN -P
> > Password:
> > 4: CPU Temp 1 (Temperature): 22.00 C (NA/78.00): [OK]
> > 5: CPU Temp 2 (Temperature): 21.00 C (NA/78.00): [OK]
> > 6: CPU Temp 3 (Temperature): 0.00 C (NA/78.00): [OK]
> > 7: CPU Temp 4 (Temperature): 0.00 C (NA/78.00): [OK]
> > 8: Sys Temp (Temperature): 36.00 C (NA/78.00): [OK]
> > 9: CPU1 Vcore (Voltage): 1.20 V (1.06/1.63): [OK]
> > 10: CPU2 Vcore (Voltage): 1.21 V (1.06/1.63): [OK]
> > 11: 3.3V (Voltage): 3.34 V (2.93/3.66): [OK]
> > 12: 5V (Voltage): 4.99 V (4.44/5.54): [OK]
> > 13: 12V (Voltage): 11.52 V (10.56/13.44): [OK]
> > 14: -12V (Voltage): -12.30 V (-10.59/-13.40): [OK]
> > 15: 1.5V (Voltage): 1.47 V (1.31/1.68): [OK]
> > 16: 5VSB (Voltage): 4.92 V (4.44/5.54): [OK]
> > 17: VBAT (Voltage): 3.31 V (2.93/3.66): [OK]
> > 18: Fan1 (Fan): 4400.00 RPM (300.00/NA): [OK]
> > 19: Fan2 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower
> > Non-Recoverable Threshold]
> > 20: Fan3 (Fan): 4300.00 RPM (300.00/NA): [OK]
> > 21: Fan4 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower
> > Non-Recoverable Threshold]
> > 22: Fan5 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower
> > Non-Recoverable Threshold]
> > 23: Fan6 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower
> > Non-Recoverable Threshold]
> > 24: Fan7/CPU1 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower
> > Non-Recoverable Threshold]
> > 25: Fan8/CPU2 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower
> > Non-Recoverable Threshold]
> > 26: Intrusion (Platform Chassis Intrusion): [General Chassis Intrusion]
> > 27: Power Supply (Power Supply): [OK]
> > 28: CPU0 Internal E (Module/Board): [OK]
> > 29: CPU1 Internal E (Module/Board): [OK]
> > 30: CPU Overheat (Module/Board): [OK]
> > 31: Thermal Trip0 (Module/Board): [OK]
> > 32: Thermal Trip1 (Module/Board): [OK]
> > 33: BIOS (System Firmware): [Unknown]
> > 
> > 
> > I hope, I only forget something and that's not a new bug.
> > 
> > Regards,
> > Gregor
> > 
> > 
> > Gregor Dschung wrote:
> > > Hey Al,
> > >
> > > whoa!!!
> > >
> > > THAT is OpenSource :). We've mailed perhaps for a week (I guess it would
> > > have taken only about three days, if we had worked both in the same
> > > timezone ;) ). And now, the issue seams to be solved:
> > > -----------
> > > p300slg01:/usr/local/src # ipmi-sensors -h gtseval-ipmi -u admin -P
> > > Password:
> > > 64: ACPI State (ACPI Power State): [S0/G0 "working"]
> > > 112: System Reset (Module/Board): [OK]
> > > 160: POST Error (System Firmware): [Unknown]
> > > 208: Memory ECC (Memory): [Unknown]
> > > 256: PCI Error (Critical Interrupt): [Unknown]
> > > 304: Fan Error (Cooling Device): [Unknown]
> > > 352: Watchdog (Watchdog 2): [Unknown]
> > > 400: CPU Fan 1 (Fan): 9992.01 RPM (NA/3475.48): [OK]
> > > 464: CPU Fan 2 (Fan): 10426.44 RPM (NA/3475.48): [OK]
> > > 528: CPU Fan 3 (Fan): 9992.01 RPM (NA/3475.48): [OK]
> > > 592: CPU Fan 4 (Fan): 10426.44 RPM (NA/3475.48): [OK]
> > > 656: CPU Fan 5 (Fan): 9592.33 RPM (NA/3475.48): [OK]
> > > 720: CPU Fan 6 (Fan): 10900.37 RPM (NA/3475.48): [OK]
> > > 784: CPU Fan 7 (Fan): 9992.01 RPM (NA/3475.48): [OK]
> > > 848: CPU Fan 8 (Fan): 10900.37 RPM (NA/3475.48): [OK]
> > > 912: CPU Fan 9 (Fan): 9992.01 RPM (NA/3475.48): [OK]
> > > 976: CPU Fan 10 (Fan): 10426.44 RPM (NA/3475.48): [OK]
> > > 1040: System Fan 1 (Fan): 9592.33 RPM (NA/3475.48): [OK]
> > > 1104: System Fan 2 (Fan): 10900.37 RPM (NA/3475.48): [OK]
> > > 1168: CPU0 Vcore (Voltage): 1.11 V (0.40/1.70): [OK]
> > > 1232: CPU1 Vcore (Voltage): 0.80 V (0.40/1.70): [OK]
> > > 1296: Standby 5V (Voltage): 4.97 V (4.26/5.79): [OK]
> > > 1360: System 5V (Voltage): 4.85 V (4.26/5.79): [OK]
> > > 1424: System 3.3V (Voltage): 3.23 V (2.82/3.85): [OK]
> > > 1488: 3V CMOS Sense (Voltage): 3.03 V (2.62/NA): [OK]
> > > 1680: CPU0 Therm Diode (Temperature): 42.00 C (10.00/80.00): [OK]
> > > 1744: CPU1 Therm Diode (Temperature): 42.00 C (10.00/80.00): [OK]
> > > 1808: CPU0 ThermDiode2 (Temperature): 42.00 C (10.00/80.00): [OK]
> > > 1872: CPU1 ThermDiode2 (Temperature): 42.00 C (10.00/80.00): [OK]
> > > 1936: AMB Temp (Temperature): 29.00 C (10.00/50.00): [OK]
> > > 2064: MultiBit ECC ER (Module/Board): [State Deasserted]
> > > 2112: VDD Power Fail (Power Supply): [State Deasserted]
> > > 2160: Reset (Button): [State Deasserted]
> > > 2208: Identify (Button): [State Deasserted]
> > > 2304: NMI (Button): [State Deasserted]
> > > 2352: CPU0 Therm-Trip (Processor): [State Deasserted]
> > > 2400: CPU1 Therm-Trip (Processor): [State Deasserted]
> > > 2448: CPU0 IERR (Processor): [State Deasserted]
> > > 2496: CPU1 IERR (Processor): [State Deasserted]
> > > 2544: CPU0 Prochot (Temperature): [Limit Not Exceeded]
> > > 2592: CPU1 Prochot (Temperature): [Limit Not Exceeded]
> > > 2640: CPU0 SocketOcc (Processor): [Device Inserted/Device Present]
> > > 2688: CPU1 SocketOcc (Processor): [Device Removed/Device Absent]
> > > 2736: CPU0 Dmn 0 Temp (Temperature): 45.00 C (NA/85.00): [OK]
> > > 2864: CPU1 Dmn 0 Temp (Temperature): 45.00 C (NA/85.00): [OK]
> > > 3248: CPU0 Dmn 1 Temp (Temperature): 45.00 C (NA/85.00): [OK]
> > > 3440: CPU1 Dmn 1 Temp (Temperature): 45.00 C (NA/85.00): [OK]
> > > -------------
> > >
> > > Thanks a lot for your help.
> > >
> > > Regards,
> > > Gregor
> > >
> > >
> > > Albert Chu wrote:
> > >> Hey Gregor,
> > >>
> > >> Doh!  I forgot a patch.  Here's the next likely FreeIPMI 0.4.6 release 
> > >> :-)
> > >>
> > >> PLMK if it works.
> > >>
> > >> Thanks,
> > >> Al
> > >>
> > >>> Hey Gregor,
> > >>>
> > >>> Attached are two tar.gz files.  One is a likely candiate for the
> > >>> FreeIPMI 0.4.6 release and another test tar.gz for debug info if
> > >>> something new goes wrong :-)
> > >>>
> > >>> PLMK how it works out.  Thanks for all the debug help.
> > >>>
> > >>> Al
> > >>>
> > >>> On Tue, 2007-10-09 at 17:25 +0200, Gregor Dschung wrote:
> > >>>> Hey Al,
> > >>>>
> > >>>> here is the sdr-cache. 'sdr-cache-p300slg01.10.136.17.128' is the file
> > >>>> for gtseval-ipmi, 'sdr-cache-p300slg01.10.136.17.170' is an other cache
> > >>>> file from a call of ipmi-sensors which works fine.
> > >>>>
> > >>>> I'm using FreeIPMI on a system with SUSE 10.1.
> > >>>> ---------
> > >>>> p300slg01:/usr/local/src # uname -a
> > >>>> Linux p300slg01 2.6.16.27-0.9-smp #1 SMP Tue Feb 13 09:35:18 UTC 2007
> > >>>> i686 i686 i386 GNU/Linux
> > >>>> ---------
> > >>>>
> > >>>> In your test4-code, I had to change the following lines to compile w/o
> > >>>> errors:
> > >>>> common/src/pstdout.c
> > >>>> -243: fprintf(stderr, "Default stack size = %li bytes \n", 
> > >>>> mystacksize);
> > >>>> +243: fprintf(stderr, "Default stack size = %li bytes \n",
> > >>>> (long)mystacksize);
> > >>>> +501: va_list vacpy;
> > >>>>
> > >>>> ---------
> > >>>>
> > >>>> I've tested FreeIPMI locally again. I was wrong, it crashes, too. I
> > >>>> guess, I was confused with IPMItool, which runs fine locally but gives
> > >>>> warnings over the network. Don't know whether it helps you:
> > >>>> Locally:
> > >>>> address@hidden:~/ipmi/usr/bin> ./ipmitool -I open sensor
> > >>>> ACPI State       | 0x1        | discrete   | 0x0180| na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> System Reset     | 0x0        | discrete   | 0x0080| na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> POST Error       | na         | discrete   | na    | na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> Memory ECC       | na         | discrete   | na    | na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> PCI Error        | na         | discrete   | na    | na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> Fan Error        | na         | discrete   | na    | na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> Watchdog         | na         | discrete   | na    | na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> CPU Fan 1        | 9992.006   | RPM        | ok    | na        |
> > >>>> na        | na        | 3996.803  | 3475.480  | na
> > >>>> CPU Fan 2        | 10426.441  | RPM        | ok    | na        |
> > >>>> na        | na        | 3996.803  | 3475.480  | na
> > >>>> CPU Fan 3        | 9992.006   | RPM        | ok    | na        |
> > >>>> na        | na        | 3996.803  | 3475.480  | na
> > >>>> CPU Fan 4        | 10426.441  | RPM        | ok    | na        |
> > >>>> na        | na        | 3996.803  | 3475.480  | na
> > >>>> CPU Fan 5        | 9223.391   | RPM        | ok    | na        |
> > >>>> na        | na        | 3996.803  | 3475.480  | na
> > >>>> CPU Fan 6        | 10900.371  | RPM        | ok    | na        |
> > >>>> na        | na        | 3996.803  | 3475.480  | na
> > >>>> CPU Fan 7        | 9992.006   | RPM        | ok    | na        |
> > >>>> na        | na        | 3996.803  | 3475.480  | na
> > >>>> CPU Fan 8        | 10900.371  | RPM        | ok    | na        |
> > >>>> na        | na        | 3996.803  | 3475.480  | na
> > >>>> CPU Fan 9        | 9992.006   | RPM        | ok    | na        |
> > >>>> na        | na        | 3996.803  | 3475.480  | na
> > >>>> CPU Fan 10       | 10426.441  | RPM        | ok    | na        |
> > >>>> na        | na        | 3996.803  | 3475.480  | na
> > >>>> System Fan 1     | 9992.006   | RPM        | ok    | na        |
> > >>>> na        | na        | 3996.803  | 3475.480  | na
> > >>>> System Fan 2     | 10900.371  | RPM        | ok    | na        |
> > >>>> na        | na        | 3996.803  | 3475.480  | na
> > >>>> CPU0 Vcore       | 1.107      | Volts      | ok    | na        |
> > >>>> 0.402     | 0.500     | 1.597     | 1.695     | na
> > >>>> CPU1 Vcore       | na         | Volts      | na    | na        |
> > >>>> 0.402     | 0.500     | 1.597     | 1.695     | na
> > >>>> Standby 5V       | 4.969      | Volts      | ok    | na        |
> > >>>> 4.263     | 4.528     | 5.527     | 5.792     | na
> > >>>> System 5V        | 4.851      | Volts      | ok    | na        |
> > >>>> 4.263     | 4.528     | 5.527     | 5.792     | na
> > >>>> System 3.3V      | 3.234      | Volts      | ok    | na        |
> > >>>> 2.822     | 2.999     | 3.675     | 3.851     | na
> > >>>> 3V CMOS Sense    | 3.028      | Volts      | ok    | na        |
> > >>>> 2.617     | 2.781     | na        | na        | na
> > >>>> CPU0 Therm Diode | na         | degrees C  | na    | na        |
> > >>>> 10.000    | na        | 68.000    | 80.000    | 95.000
> > >>>> CPU1 Therm Diode | na         | degrees C  | na    | na        |
> > >>>> 10.000    | na        | 68.000    | 80.000    | 95.000
> > >>>> CPU0 ThermDiode2 | na         | degrees C  | na    | na        |
> > >>>> 10.000    | na        | 68.000    | 80.000    | 95.000
> > >>>> CPU1 ThermDiode2 | na         | degrees C  | na    | na        |
> > >>>> 10.000    | na        | 68.000    | 80.000    | 95.000
> > >>>> AMB Temp         | 29.000     | degrees C  | ok    | na        |
> > >>>> 10.000    | na        | 30.000    | 45.000    | na
> > >>>> MultiBit ECC ER  | 0x0        | discrete   | 0x0180| na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> VDD Power Fail   | 0x0        | discrete   | 0x0180| na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> Reset            | 0x0        | discrete   | 0x0180| na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> Identify         | 0x0        | discrete   | 0x0180| na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> NMI              | 0x0        | discrete   | 0x0180| na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> CPU0 Therm-Trip  | 0x0        | discrete   | 0x0180| na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> CPU1 Therm-Trip  | na         | discrete   | na    | na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> CPU0 IERR        | 0x0        | discrete   | 0x0180| na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> CPU1 IERR        | na         | discrete   | na    | na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> CPU0 Prochot     | 0x0        | discrete   | 0x0180| na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> CPU1 Prochot     | na         | discrete   | na    | na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> CPU0 SocketOcc   | 0x1        | discrete   | 0x0280| na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> CPU1 SocketOcc   | 0x0        | discrete   | 0x0180| na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> CPU0 Dmn 0 Temp  | 45.000     | degrees C  | ok    | na        |
> > >>>> na        | na        | na        | 85.000    | 95.000
> > >>>> CPU1 Dmn 0 Temp  | na         | degrees C  | na    | na        |
> > >>>> na        | na        | na        | 85.000    | 95.000
> > >>>> CPU0 Dmn 1 Temp  | 46.000     | degrees C  | ok    | na        |
> > >>>> na        | na        | na        | 85.000    | 95.000
> > >>>> CPU1 Dmn 1 Temp  | na         | degrees C  | na    | na        |
> > >>>> na        | na        | na        | 85.000    | 95.000
> > >>>>
> > >>>> Over a RCMP+-Session:
> > >>>> [...]
> > >>>> System Reset     | 0x0        | discrete   | 0x0080| na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> Error reading sensor POST Error (#01)
> > >>>> Error reading sensor Memory ECC (#02)
> > >>>> Error reading sensor PCI Error (#03)
> > >>>> Error reading sensor Fan Error (#04)
> > >>>> Watchdog         | na         | discrete   | na    | na        |
> > >>>> na        | na        | na        | na        | na
> > >>>> CPU Fan 1        | 9992.006   | RPM        | ok    | na        |
> > >>>> na        | na        | 3996.803  | 3475.480  | na
> > >>>> [...]
> > >>>>
> > >>>> The missing lines are equal.
> > >>>> -----------
> > >>>>
> > >>>> I've called ipmi-sensors from an x86_64 to reach gtseval-ipmi, too. And
> > >>>> it crashes with the same error (second attachment).
> > >>>>
> > >>>> So... Enough debugging for today.
> > >>>>
> > >>>> Have a nice day,
> > >>>> Gregor
> > >>>>
> > >>>> Al Chu wrote:
> > >>>>> Hey Gregor,
> > >>>>>
> > >>>>> Although it's unlikely your problem, I saw one other potential issue.
> > >>>>> So I added a fix in this slightly newer tar.gz.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Al
> > >>>>>
> > >>>>> On Mon, 2007-10-08 at 11:51 -0700, Al Chu wrote:
> > >>>>>> Hey Gregor,
> > >>>>>>
> > >>>>>> Here's another tar.gz.  Could you run ./configure with --enable-debug
> > >>>>>> and run with --debug again?  The gdb output confirms the line I
> > >>>> believed
> > >>>>>> was causing the problem, but I still can't quite figure out how the
> > >>>>>> corruption is happening.  So I put in a lot more printfs.
> > >>>>>>
> > >>>>>> I do have atleast two other suspicions, that depend on your system.
> > >>>> So
> > >>>>>> do you think you could also send me the SDR from
> > >>>> ~/.freeipmi/sdr-cache/
> > >>>>>> for me to analyze and also could you tell me what linux you are
> > >>>> running
> > >>>>>> on the i386 box?  I'm wondering if you have some older distribution
> > >>>> (b/c
> > >>>>>> its i386) and it has slightly different threads behavior that I'm not
> > >>>>>> handling properly.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Al
> > >>>>>>
> > >>>>>>
> > >>>>>> On Sun, 2007-10-07 at 12:12 +0200, Gregor Dschung wrote:
> > >>>>>>> Hi Al,
> > >>>>>>>
> > >>>>>>> I attach again the output of the call with --debug and the
> > >>>> backtrace. It
> > >>>>>>> was the first time that I used gdb, so I hope I understood the
> > >>>> tutorials
> > >>>>>>> :)
> > >>>>>>>
> > >>>>>>> At the moment I'm not able to run ipmi-sensors locally, because I'm
> > >>>> not
> > >>>>>>> root on "gtseval" (the host of gtseval-ipmi) and I've to wait until
> > >>>> I get
> > >>>>>>> rw-rights for /dev/ipmi0 again. And we have week-end ;)
> > >>>>>>>
> > >>>>>>> You are right, I'm running the IPMItool and FreeIPMI on an i386. On
> > >>>>>>> gtseval is a 64bit-System, so perhaps this is the reason for not
> > >>>> crashing
> > >>>>>>> locally.
> > >>>>>>>
> > >>>>>>> Have a nice Sunday,
> > >>>>>>> Gregor
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> Hey Gregor,
> > >>>>>>>>
> > >>>>>>>> Can't see anything suspicuous in the code.  Here's another tar.gz
> > >>>> that I
> > >>>>>>>> added a whole bunch of extra printfs to try and give me more
> > >>>> information,
> > >>>>>>>> could you run again (./configure --enable-debug and run
> > >>>> ipmi-sensors with
> > >>>>>>>> --debug again).  Also, you mentioned that ipmi-sensors completes
> > >>>> locally
> > >>>>>>>> without issue.  Are the number of sensor listed below (ending w/
> > >>>> CPU1 Dmn
> > >>>>>>>> 1 Temp) the same as the number of sensors listed when you run
> > >>>> locally?
> > >>>>>>>> Also, is a core dump being output by this crash?  Could you run gdb
> > >>>>>>>> against the core and get a backtrace?  That'd be a lot of help too.
> > >>>>>>>>
> > >>>>>>>> Thanks for helping me look into this,
> > >>>>>>>>
> > >>>>>>>> Al
> > >>>>>>>>
> > >>>>>>>>> Hi Al,
> > >>>>>>>>>
> > >>>>>>>>> thanks for your fast answer.
> > >>>>>>>>>
> > >>>>>>>>> I've tested your test-version and it seems to be on the correct
> > >>>> way. It
> > >>>>>>>>> still crashes, but now I get sensor-data :) :
> > >>>>>>>>>
> > >>>>>>>>> [...]
> > >>>>>>>>>
> > >>>>>>>> --
> > >>>>>>>> Albert Chu
> > >>>>>>>> address@hidden
> > >>>>>>>> 925-422-5311
> > >>>>>>>> Computer Scientist
> > >>>>>>>> High Performance Systems Division
> > >>>>>>>> Lawrence Livermore National Laboratory
> > >>>>>>>>
> > >>> --
> > >>> Albert Chu
> > >>> address@hidden
> > >>> 925-422-5311
> > >>> Computer Scientist
> > >>> High Performance Systems Division
> > >>> Lawrence Livermore National Laboratory
> > >>>
> > >
> > 
> > 
-- 
Albert Chu
address@hidden
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory




reply via email to

[Prev in Thread] Current Thread [Next in Thread]