freeipmi-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freeipmi-users] ipmi-sensors/monitoring problems


From: Al Chu
Subject: Re: [Freeipmi-users] ipmi-sensors/monitoring problems
Date: Sat, 13 Sep 2008 15:39:37 -0400

Hey Kevin,

Just to make sure, you're using the newest FreeIPMI 0.6.7?  There's been
a number of corner case fixes in last 4 or 5 minor releases.

On Sat, 2008-09-13 at 16:25 -0500, Kevin Day wrote:
> Hey, FreeIPMI guys! I'm trying to get sensor monitoring going on a  
> wide range of hardware running FreeBSD. It's mostly just worked, with  
> a couple of exceptions.
> 
> The first problem is with some older Dell 2650 servers. This is how  
> they appear in dmidecode:
> 
> System Information
>          Manufacturer: Dell Computer Corporation
>          Product Name: PowerEdge 2650
> 
> IPMI Device Information
>          Interface Type: SMIC (Server Management Interface Chip)
>          Specification Version: 1.0
>          I2C Slave Address: 0x10
>          NV Storage Device: Not Present
>          Base Address: 0x000000000000ECF4 (I/O)
>          Register Spacing: Successive Byte Boundaries
> 
> 
> Some commands seem to kinda work like ipmi-sel, ipmi-fru (the  
> exception being the date):
> 
> # ipmi-fru -v
> FRU Inventory Device ID: 0x00
> 
>    FRU Board Info Area Manufacturing Date/Time: 05/22/10 - 17:36:00
>    FRU Board Manufacturer: Dell Inc.
>    FRU Board Product Name: Dell Remote Access Controller
>    FRU Board Part Number: A03
> 
>    FRU Product Manufacturer Name: Dell Inc.
>    FRU Product Product Name: Dell Remote Access Controller
>    FRU Product Part/Model Number: RAC V1.0
>    FRU Product Version Type: 3.12
> 
>    FRU Management Access Record Length Incorrect: 20

This is odd.  The record length simply does not match what the record is
supposed to have.  It's possible it's a corner case in my parsing or an
issue with their motherboard (I've already seen a few motherboards with
FRU data that is non-compliant and I have to work around it).  Could you
send me the output of ipmi-fru w/ --debug.

> 
> but sensors don't seem to, giving  
> "ipmi_cmd_get_sensor_reading_discrete: bad completion code: request  
> data/parameter invalid" at everything I do. I can provide full logs or  
> any debugging if it's okay to post anything that long here. "ipmitool  
> sensor list" seems to work perfectly, oddly enough. Where do I start  
> to try to figure out what's wrong?

Looks like the motherboard is reporting an error that I am not working
around/handling properly in ipmi-sensors/ipmimonitoring.  I think the
reason ipmitool works is because it ignores errors on sensors and
outputs all remaining sensors it can.  Could you send me the --debug
output of ipmi-sensors?

> 
> 
> The other issue I'm having is on a much newer HP Proliant DL185 G5. It  
> appears as:
> 
> System Information
>          Manufacturer: HP
>          Product Name: ProLiant DL185 G5
> 
> IPMI Device Information
>          Interface Type: KCS (Keyboard Control Style)
>          Specification Version: 2.0
>          I2C Slave Address: 0x10
>          NV Storage Device: Not Present
>          Base Address: 0x0000000000000CA2 (I/O)
>          Register Spacing: Successive Byte Boundaries
> 
> ipmi-sensors itself seems to work okay at the beginning, but errors  
> out at the end:
> 
> 64: POST Error (System Firmware): [Unknown]
> 112: Memory ECC (Memory): [Unknown]
> 160: ACPI State (ACPI Power State): [S0/G0 "working"]
> 208: System Reset (Module/Board): [OK]
> 256: SYSTEM FAN 1 (Fan): 6435.01 RPM (0.00/1000.40): [OK]
> 320: SYSTEM FAN 2 (Fan): 6265.66 RPM (0.00/1000.40): [OK]
> 384: SYSTEM FAN 3 (Fan): 6265.66 RPM (0.00/1000.40): [OK]
> 448: SYSTEM FAN 4 (Fan): 6265.66 RPM (0.00/1000.40): [OK]
> 512: Rear HDD Opt Fan (Fan): 1904.76 RPM (0.00/1000.40): [OK]
> 576: System 12V (Voltage): 12.26 V (NA/NA): [OK]
> 640: System 5V (Voltage): 5.17 V (NA/NA): [OK]
> 704: System AUX 5V (Voltage): 5.19 V (NA/NA): [OK]
> 768: System 3.3V (Voltage): 3.37 V (NA/NA): [OK]
> 832: System AUX 3.3V (Voltage): 3.34 V (NA/NA): [OK]
> 896: CPU0 Vcore (Voltage): 1.39 V (NA/NA): [OK]
> 960: CPU1 Vcore (Voltage): 1.33 V (NA/NA): [OK]
> 1024: CPU0 Mem Vcore (Voltage): 1.81 V (NA/NA): [OK]
> 1088: CPU1 Mem Vcore (Voltage): 1.80 V (NA/NA): [OK]
> 1152: CPU0 MEM VTT (Voltage): 0.94 V (NA/NA): [OK]
> 1216: CPU1 MEM VTT (Voltage): 0.92 V (NA/NA): [OK]
> 1280: NB  SB Vcore (Voltage): 1.23 V (NA/NA): [OK]
> 1344: CPU0 Diode (Temperature): 33.00 C (NA/85.00): [OK]
> 1408: CPU1 Diode (Temperature): 36.50 C (NA/85.00): [OK]
> 1472: Power Ambient (Temperature): 4.00 C (NA/45.00): [OK]
> 1536: Rear Ambient (Temperature): 7.00 C (NA/45.00): [OK]
> 1600: SB HTX Ambient (Temperature): 0.00 C (NA/45.00): [OK]
> 1664: NB Ambient (Temperature): 0.00 C (NA/45.00): [OK]
> 1728: Front Panel Temp (Temperature): 13.50 C (NA/45.00): [OK]
> 1792: Therm-Trip0 (Processor): [State Deasserted]
> 1840: CPU0 Prochot (Temperature): [Limit Not Exceeded]
> 1888: CPU1 Prochot (Temperature): [Limit Not Exceeded]
> 1936: CPU Socket 0 (Processor): [Device Inserted/Device Present]
> 1984: CPU Socket 1 (Processor): [Device Inserted/Device Present]
> 2032: PS1 Present (Power Supply): [Device Inserted/Device Present]
> 2080: PS2 Present (Power Supply): [Device Removed/Device Absent]
> 2128: PS1 Status (Power Supply): [Performance Met]
> 2176: PS2 Status (Power Supply): [Performance Met]
> 2224: Red PS Present (Power Unit): [Device Inserted/Device Present]
> 2272: PS Redundancy (FRU Sensor): [Redundancy Lost]
> 2416: Identify (Button): [State Deasserted]
> ipmi_cmd_get_sensor_reading_discrete: bad completion code: request  
> data/parameter invalid

Again, looks like the motherboard is reporting an error that I am not
working around/handling properly.  Could you send me the --debug output?

> 
> but, ipmimonitoring gets fixated on record 832:
> 
> Record_ID | Sensor Name | Sensor Group | Monitoring Status| Sensor  
> Units | Sensor Reading
> 256 | SYSTEM FAN 1 | Fan | Nominal | RPM | 6435.006435
> 320 | SYSTEM FAN 2 | Fan | Nominal | RPM | 6265.664160
> 384 | SYSTEM FAN 3 | Fan | Nominal | RPM | 6105.006105
> 448 | SYSTEM FAN 4 | Fan | Nominal | RPM | 6435.006435
> 512 | Rear HDD Opt Fan | Fan | Nominal | RPM | 1904.761905
> 576 | System 12V | Voltage | Nominal | V | 12.264000
> 640 | System 5V | Voltage | Nominal | V | 5.171400
> 704 | System AUX 5V | Voltage | Nominal | V | 5.194800
> 768 | System 3.3V | Voltage | Nominal | V | 3.372600
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800
> 832 | System AUX 3.3V | Voltage | Nominal | V | 3.341800

Is this looping forever or does it complete?  This ones a little more
fishy.  Looks like I am accidently storing incorrect data.  Could you
send me --debug output?

> Are either of these known problems? If not, what can I do to help?

Not known problems to me.  The high odds are the motherboard is
reporting some strange error that I need to handle/work around.  For
example, ipmi-sensors/ipmimonitoring will report a sensor reading as
"Unknown" on a "cannot read sensor" or "bmc busy" and similar error
codes.  We just need to find out what error code those motherboards are
reporting and handle it properly.

Thanks,
Al

> -- Kevin
> 
> 
> 
> 
> 
> _______________________________________________
> Freeipmi-users mailing list
> address@hidden
> http:// lists.gnu.org/mailman/listinfo/freeipmi-users
> 
-- 
Albert Chu
address@hidden
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory





reply via email to

[Prev in Thread] Current Thread [Next in Thread]