freeipmi-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freeipmi-users] PS Status showing Unrecognized Event after upgradin


From: Al Chu
Subject: Re: [Freeipmi-users] PS Status showing Unrecognized Event after upgrading a supermicro X8 mobo ipmimodule to fw 3.12
Date: Mon, 17 Feb 2014 10:11:27 -0800

Hi Ingard,

The link you found proves it's an OEM extension to the motherboard.

You could accomplish the same thing with FreeIPMI's ipmi-raw tool (you
have to specify a LUN with ipmi-raw, that's just 0x00) and the output
from ipmi-raw will be a tad different, but the basics are the same.  So
something like

> ipmi-raw 0x00 0x06 0x52 0x07 0x78 0x01 0x78

unsure if you'll have to adjust the slave addresses for your
motherboard.

I'd be glad to add this into ipmi-oem so it's a nice commandline tool
instead, but I'm not entirely sure how they came upon these slave
addresses and what they would be for your motherboard.  Can you ping
Supermicro if they are fixed this way for many Supermicro motherboards?
Same with the bus number.  Any additional information to make this more
clear instead of random numbers from Supermicro would help.

Al

P.S.  The motherboard appears to calculate FRU checksums incorrectly, so
you can try ipmi-fru w/ "-W skipchecks" to skip the checksum checks.

On Mon, 2014-02-17 at 13:02 +0100, Ingard Mevåg wrote:
> Hi again Al and thanks for your help so far!
> 
> 
> After a lot of googling I've found this post which might have some of
> the information:
> http://www.tummy.com/articles/supermicro-ipmi-nagios-check/
> 
> 
> 
> Below is the output of the cmds you suggested:
> 
> 
> address@hidden:~# ipmi-fru
> FRU Inventory Device: BMC FRU (ID 00h)
> 
> 
>   FRU Error: board info area checksum invalid
> 
> 
>   FRU Error: product info area checksum invalid
> address@hidden:~# ipmi-fru --bridge-fru
> FRU Inventory Device: BMC FRU (ID 00h)
> 
> 
>   FRU Error: board info area checksum invalid
> 
> 
>   FRU Error: product info area checksum invalid
> address@hidden:~# ipmi-sensors --bridge-sensors
> ID   | Name          | Type              | Reading    | Units | Event
> 4    | System Temp   | Temperature       | 30.00      | C     | 'OK'
> 71   | CPU Temp      | OEM Reserved      | N/A        | N/A   | 'OEM
> Event = 0000h'
> 138  | FAN 1         | Fan               | N/A        | RPM   | N/A
> 205  | FAN 2         | Fan               | 1695.00    | RPM   | 'OK'
> 272  | FAN 3         | Fan               | 4470.00    | RPM   | 'OK'
> 339  | FAN 4         | Fan               | N/A        | RPM   | N/A
> 406  | FAN 5         | Fan               | N/A        | RPM   | N/A
> 473  | CPU Vcore     | Voltage           | 0.86       | V     | 'OK'
> 540  | +3.3VCC       | Voltage           | 3.31       | V     | 'OK'
> 607  | +12 V         | Voltage           | 12.19      | V     | 'OK'
> 674  | CPU DIMM      | Voltage           | 1.54       | V     | 'OK'
> 741  | +5 V          | Voltage           | 5.15       | V     | 'OK'
> 808  | -12 V         | Voltage           | -12.48     | V     | 'OK'
> 875  | VBAT          | Voltage           | 3.25       | V     | 'OK'
> 942  | +3.3VSB       | Voltage           | 3.30       | V     | 'OK'
> 1009 | AVCC          | Voltage           | 3.31       | V     | 'OK'
> 1076 | Chassis Intru | Physical Security | N/A        | N/A   | 'OK'
> 1143 | PS Status     | Power Supply      | N/A        | N/A   |
> 'Unrecognized Event = 0100h' 'Unrecognized Event = 0200h'
> 'Unrecognized Event = 0400h' 'Unrecognized Event = 0800h'
> 'Unrecognized Event = 1000h' 'Unrecognized Event = 2000h'
> 'Unrecognized Event = 4000h'
> address@hidden:~# ipmi-sensors --shared-sensors
> ID   | Name          | Type              | Reading    | Units | Event
> 4    | System Temp   | Temperature       | 30.00      | C     | 'OK'
> 71   | CPU Temp      | OEM Reserved      | N/A        | N/A   | 'OEM
> Event = 0000h'
> 138  | FAN 1         | Fan               | N/A        | RPM   | N/A
> 205  | FAN 2         | Fan               | 1695.00    | RPM   | 'OK'
> 272  | FAN 3         | Fan               | 4655.00    | RPM   | 'OK'
> 339  | FAN 4         | Fan               | N/A        | RPM   | N/A
> 406  | FAN 5         | Fan               | N/A        | RPM   | N/A
> 473  | CPU Vcore     | Voltage           | 0.86       | V     | 'OK'
> 540  | +3.3VCC       | Voltage           | 3.31       | V     | 'OK'
> 607  | +12 V         | Voltage           | 12.19      | V     | 'OK'
> 674  | CPU DIMM      | Voltage           | 1.54       | V     | 'OK'
> 741  | +5 V          | Voltage           | 5.15       | V     | 'OK'
> 808  | -12 V         | Voltage           | -12.29     | V     | 'OK'
> 875  | VBAT          | Voltage           | 3.25       | V     | 'OK'
> 942  | +3.3VSB       | Voltage           | 3.30       | V     | 'OK'
> 1009 | AVCC          | Voltage           | 3.31       | V     | 'OK'
> 1076 | Chassis Intru | Physical Security | N/A        | N/A   | 'OK'
> 1143 | PS Status     | Power Supply      | N/A        | N/A   |
> 'Unrecognized Event = 0100h' 'Unrecognized Event = 0200h'
> 'Unrecognized Event = 0400h' 'Unrecognized Event = 0800h'
> 'Unrecognized Event = 1000h' 'Unrecognized Event = 2000h'
> 'Unrecognized Event = 4000h'
> 
> 
> 
> 
> Kind regards
> Ingard
> 
> 
> 
> 
> 2014-02-15 19:22 GMT+01:00 Al Chu <address@hidden>:
>         Hi Ingard,
>         
>         I realized another possibility.  The sensor could be
>         "shared".  So you
>         could try the --shared-sensors option.
>         
>         Al
>         
>         On Fri, 2014-02-14 at 10:26 -0800, Albert Chu wrote:
>         > Hi Ingard,
>         >
>         > Searching through the NEWS file it was released in FreeIPMI
>         1.0.2.
>         >
>         > Is it possible you're looking at FRU info?  (b/c the option
>         is called
>         > -psfruinfo). You can try ipmi-fru with the --bridge-fru
>         option and see
>         > if that works.
>         >
>         > Also try running ipmi-sensors with --bridge-sensors.  If
>         that doesn't
>         > work, then this might be an OEM extension from Supermicro.
>          The Slave
>         > Addresses they list (0x70 and 0x72) are not the defaults.
>          If
>         > ipmi-sensors cannot find this sensor in the SDR (sensor data
>         > repository), then Supermicro is getting this through some
>         other means
>         > that isn't standard.
>         >
>         > If you ask Supermicro for the OEM extension information,
>         then it's
>         > possible it could be added into FreeIPMI.
>         >
>         > Al
>         >
>         > It appears in the ipmicfg example below that they are going
>         through the FRU to get information.
>         >
>         > On Fri, 2014-02-14 at 08:14 +0100, ingard Mevåg wrote:
>         > > Hi Al
>         > >
>         > > Thanks for the information. I had already tried to reset
>         the device, but running with —ignore-unrecognized-events did
>         the trick :)
>         > > At least my monitoring is happy now for this one node
>         running latest beta. I was wondering if you knew when this
>         feature got introduced and/or if there are deb packages for
>         the ubuntu LTS releases somewhere? I’ve got quite a few Lucid
>         and Precise servers running version 0.7.15 and 0.8.12.
>         > >
>         > > Also, is it possible to get information per PSU somehow?
>         Supermicro’s ipmicfg gives the following for instance:
>         > >
>         > >
>         address@hidden:~/ipmicfg_1.14.3_20130725/linux/64bit# 
> ./ipmicfg-linux.x86_64 -psfruinfo
>         > >  [SlaveAddress = 70h] [Module 1]
>         > >  Item                           |                Value
>         > >  ----                           |                -----
>         > >  Status                         |                   On
>         > >  Temperature                    |              29C/84F
>         > >  Fan 1                          |             7213 RPM
>         > >  Fan 2                          |            10076 RPM
>         > >
>         > >  [SlaveAddress = 72h] [Module 2]
>         > >  Item                           |                Value
>         > >  ----                           |                -----
>         > >  Status                         |                   On
>         > >  Temperature                    |              28C/82F
>         > >  Fan 1                          |             7213 RPM
>         > >  Fan 2                          |             9732 RPM
>         > >
>         > > Regards
>         > > Ingard
>         > >
>         > > On 13 Feb 2014, at 17:12, Al Chu <address@hidden> wrote:
>         > >
>         > > > Hi Ingard,
>         > > >
>         > > > This sounds familiar, although I cannot recall how to
>         fix it through the
>         > > > firmware. It's possible a cold reset of the BMC could to
>         it.   You can
>         > > > do a cold reset via
>         > > >
>         > > >> bmc-device --cold-reset
>         > > >
>         > > > If that doesn't work, you can tell ipmimonitoring to
>         ignore unrecognized
>         > > > events via the --ignore-unrecognized-events option.
>          After that
>         > > > everything should work.
>         > > >
>         > > > Al
>         > > >
>         > > > On Thu, 2014-02-13 at 16:17 +0100, Ingard Mevåg wrote:
>         > > >> Hi guys
>         > > >>
>         > > >> I've been upgrading the firmware on the ipmi module on
>         some
>         > > >> supermicro X8SIU recently and I'm now experiencing
>         problems with the PS
>         > > >> Status sensor. Is there anything I can do to make the
>         sensor detect the
>         > > >> PSUs properly?
>         > > >>
>         > > >> Link to board:
>         > > >>
>         
> http://www.supermicro.com/products/motherboard/Xeon3000/3400/X8SIU.cfm?IPMI=Y
>         > > >> The latest firmware as of now is version 3.12.
>         > > >> Output from ipmimonitoring:
>         > > >> address@hidden:/usr/local# ./sbin/ipmimonitoring -V
>         > > >> ipmi-sensors - 1.4.0.beta0
>         > > >>
>         > > >> address@hidden:/usr/local# ./sbin/ipmimonitoring
>         > > >> ID   | Name          | Type              | State    |
>         Reading    | Units |
>         > > >> Event
>         > > >> 4    | System Temp   | Temperature       | Nominal  |
>         32.00      | C     |
>         > > >> 'OK'
>         > > >> 71   | CPU Temp      | OEM Reserved      | N/A      |
>         N/A        | N/A   |
>         > > >> 'OEM Event = 0000h'
>         > > >> 205  | FAN 2         | Fan               | Nominal  |
>         1695.00    | RPM   |
>         > > >> 'OK'
>         > > >> 272  | FAN 3         | Fan               | Nominal  |
>         4655.00    | RPM   |
>         > > >> 'OK'
>         > > >> 473  | CPU Vcore     | Voltage           | Nominal  |
>         0.85       | V     |
>         > > >> 'OK'
>         > > >> 540  | +3.3VCC       | Voltage           | Nominal  |
>         3.31       | V     |
>         > > >> 'OK'
>         > > >> 607  | +12 V         | Voltage           | Nominal  |
>         12.19      | V     |
>         > > >> 'OK'
>         > > >> 674  | CPU DIMM      | Voltage           | Nominal  |
>         1.54       | V     |
>         > > >> 'OK'
>         > > >> 741  | +5 V          | Voltage           | Nominal  |
>         5.15       | V     |
>         > > >> 'OK'
>         > > >> 808  | -12 V         | Voltage           | Nominal  |
>         -12.48     | V     |
>         > > >> 'OK'
>         > > >> 875  | VBAT          | Voltage           | Nominal  |
>         3.25       | V     |
>         > > >> 'OK'
>         > > >> 942  | +3.3VSB       | Voltage           | Nominal  |
>         3.30       | V     |
>         > > >> 'OK'
>         > > >> 1009 | AVCC          | Voltage           | Nominal  |
>         3.31       | V     |
>         > > >> 'OK'
>         > > >> 1076 | Chassis Intru | Physical Security | Nominal  |
>         N/A        | N/A   |
>         > > >> 'OK'
>         > > >> 1143 | PS Status     | Power Supply      | N/A      |
>         N/A        | N/A   |
>         > > >> 'Unrecognized Event = 0100h' 'Unrecognized Event =
>         0200h' 'Unrecognized
>         > > >> Event = 0400h' 'Unrecognized Event = 0800h'
>         'Unrecognized Event = 1000h'
>         > > >> 'Unrecognized Event = 2000h' 'Unrecognized Event =
>         4000h'
>         > > >>
>         > > >> Kind Regards
>         > > >> Ingard
>         > > >> _______________________________________________
>         > > >> Freeipmi-users mailing list
>         > > >> address@hidden
>         > > >> https://lists.gnu.org/mailman/listinfo/freeipmi-users
>         > > > --
>         > > > Albert Chu
>         > > > address@hidden
>         > > > Computer Scientist
>         > > > High Performance Systems Division
>         > > > Lawrence Livermore National Laboratory
>         > > >
>         > >
>         --
>         Albert Chu
>         address@hidden
>         Computer Scientist
>         High Performance Systems Division
>         Lawrence Livermore National Laboratory
>         
>         
> 
> 
> 
> 
> -- 
> Ingard Mevåg
> 
> Driftssjef
> JottaCloud    
>  
> Mobil: +47 450 22 834
> E-post: address@hidden
> Webside: www.jottacloud.com
-- 
Albert Chu
address@hidden
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory




reply via email to

[Prev in Thread] Current Thread [Next in Thread]