freeipmi-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Freeipmi-users] request: status info for discrete sensors for monitorin


From: Werner Fischer
Subject: [Freeipmi-users] request: status info for discrete sensors for monitoring purposes
Date: Tue, 22 Jun 2010 13:16:43 +0200

Hi Al,

ipmimonitoring seems to be very useful for my needs. I gave it a try
with an Intel SR2500 server. I unplugged one power chord from Power
Supply 1 (PS1) and removed the cover of the cassis:

ipmimonitoring reports "Critical" in the fourth column, which is great:
        address@hidden:~$ ipmimonitoring -h 192.168.1.211 -u monitor -p 
relation -l user | grep "| Critical |"
        33 | Power Redundancy | Power Unit | Critical | N/A | 'Redundancy Lost' 
'Non-redundant:Sufficient Resources from Redundant'
        36 | Physical Scrty | Physical Security | Critical | N/A | 'General 
Chassis Intrusion'
        49 | PS1 Status | Power Supply | Critical | N/A | 'Presence detected' 
'Power Supply input lost (AC/DC)'
        address@hidden:~$ 

With ipmitool I got an "ok" for these sensors:
        address@hidden:~$ ipmitool -I lan -H 192.168.1.211 -U monitor -P 
relation -L user sdr elist
        [...]
        PS1 AC Current   | 78h | ok  | 10.1 | 0.12 Amps
        PS2 AC Current   | 79h | ok  | 10.2 | 0.93 Amps
        PS1 +12V Current | 7Ah | ok  | 10.1 | 0 Amps
        PS2 +12V Current | 7Bh | ok  | 10.2 | 16 Amps
        PS1 +12V Power   | 7Ch | ok  | 10.1 | 0 Watts
        PS2 +12V Power   | 7Dh | ok  | 10.2 | 192 Watts
        P1 Therm Margin  | 99h | ok  |  3.1 | -49 degrees C
        P2 Therm Margin  | 9Bh | ok  |  3.2 | -54 degrees C
        P1 Therm Ctrl %  | C0h | ok  |  3.1 | 0 unspecified
        P2 Therm Ctrl %  | C1h | ok  |  3.2 | 0 unspecified
        Proc 1 Vccp      | D0h | ok  |  3.1 | 1.23 Volts
        Proc 2 Vccp      | D1h | ok  |  3.2 | 1.23 Volts
        Mem Therm Margin | 48h | ns  |  3.2 | No Reading
        Pwr Unit Stat    | 01h | ok  | 21.1 | 
        Power Redundancy | 02h | ok  | 21.1 | Redundancy Lost, Non-Redundant: 
Sufficient from Redundant
        BMC Watchdog     | 03h | ok  |  7.1 | 
        Platform Secu V  | 04h | ok  |  7.1 | 
        Physical Scrty   | 05h | ok  | 23.1 | General Chassis intrusion
        [...]

Another test with ipmimonitoring, when PS1 is completely removed:
        address@hidden:~$ ipmimonitoring -h 192.168.1.211 -u monitor -p 
relation -l user | grep "| Critical |"
        32 | Pwr Unit Stat | Power Unit | Nominal | N/A | 'OK'
        33 | Power Redundancy | Power Unit | Critical | N/A | 'Redundancy Lost' 
'Non-redundant:Sufficient Resources from Redundant'
        [...]
        49 | PS1 Status | Power Supply | Nominal | N/A | 'OK'
        50 | PS2 Status | Power Supply | Nominal | N/A | 'Presence detected'
        
        (Here ipmimonitoring says 'OK' in the last column, VMware says
        "Unknown" when a power supply is not installed - see
        http://www.wefi.net/shared/sr2500-example-1.png)


My question: how do you distinguish in ipmimonitoring which of the
assertion states are ok ("Nominal") and which are not ("Critical")?

Thanks a lot for your great help,
best regards,
Werner

PS: here is the full output of impimonitoring from my first test:
address@hidden:~$ ipmimonitoring -h 192.168.1.211 -u monitor -p relation -l user
Record_ID | Sensor Name | Sensor Group | Monitoring Status| Sensor Units | 
Sensor Reading
1 | BB +1.2V Vtt | Voltage | Nominal | V | 1.197000 
2 | BB +1.5V AUX | Voltage | Nominal | V | 1.466400 
3 | BB +1.5V | Voltage | Nominal | V | 1.482000 
4 | BB +1.8V | Voltage | Nominal | V | 1.785000 
5 | BB +3.3V | Voltage | Nominal | V | 3.354000 
6 | BB +3.3V STB | Voltage | Nominal | V | 3.354000 
7 | BB +1.5V ESB | Voltage | Nominal | V | 1.505400 
8 | BB +5V | Voltage | Nominal | V | 5.070000 
9 | BB +12V AUX | Voltage | Nominal | V | 11.904000 
10 | BB +0.9V | Voltage | Nominal | V | 0.897600 
11 | Serverboard Temp | Temperature | Nominal | C | 29.000000 
12 | Ctrl Panel Temp | Temperature | Nominal | C | 25.000000 
13 | Fan 1 | Fan | Nominal | RPM | 5891.000000 
14 | Fan 2 | Fan | Nominal | RPM | 6278.000000 
15 | Fan 3 | Fan | Nominal | RPM | 5805.000000 
16 | Fan 4 | Fan | Nominal | RPM | 6321.000000 
17 | Fan 5 | Fan | Nominal | RPM | 9052.000000 
18 | Fan 6 | Fan | Nominal | RPM | 8060.000000 
19 | PS1 AC Current | Current | Nominal | A | 0.124000 
20 | PS2 AC Current | Current | Nominal | A | 0.992000 
21 | PS1 +12V Current | Current | Nominal | A | 0.000000 
22 | PS2 +12V Current | Current | Nominal | A | 15.000000 
23 | PS1 +12V Power | N/A | Nominal | W | 0.000000 
24 | PS2 +12V Power | N/A | Nominal | W | 192.000000 
25 | P1 Therm Margin | Temperature | Nominal | C | -49.000000 
26 | P2 Therm Margin | Temperature | Nominal | C | -53.000000 
27 | P1 Therm Ctrl % | Temperature | Nominal | N/A | 0.000000 
28 | P2 Therm Ctrl % | Temperature | Nominal | N/A | 0.000000 
29 | Proc 1 Vccp | Voltage | Nominal | V | 1.227600 
30 | Proc 2 Vccp | Voltage | Nominal | V | 1.233800 
32 | Pwr Unit Stat | Power Unit | Nominal | N/A | 'OK'
33 | Power Redundancy | Power Unit | Critical | N/A | 'Redundancy Lost' 
'Non-redundant:Sufficient Resources from Redundant'
34 | BMC Watchdog | Watchdog 2 | Nominal | N/A | 'OK'
35 | Platform Secu V | Platform Security Violation Attempt | Nominal | N/A | 
'OK'
36 | Physical Scrty | Physical Security | Critical | N/A | 'General Chassis 
Intrusion'
37 | FP Interrupt | Critical Interrupt | Nominal | N/A | 'OK'
38 | Event Log Disabl | Event Logging Disabled | Nominal | N/A | 'OK'
40 | System Event | System Event | Nominal | N/A | 'OK'
41 | BB Vbat | Battery | Nominal | N/A | 'OK'
42 | Fan 1 Present | Fan | Nominal | N/A | 'Device Inserted/Device Present'
43 | Fan 2 Present | Fan | Nominal | N/A | 'Device Inserted/Device Present'
44 | Fan 3 Present | Fan | Nominal | N/A | 'Device Inserted/Device Present'
45 | Fan 4 Present | Fan | Nominal | N/A | 'Device Inserted/Device Present'
46 | Fan 5 Present | Fan | Nominal | N/A | 'Device Inserted/Device Present'
47 | Fan 6 Present | Fan | Nominal | N/A | 'Device Inserted/Device Present'
48 | Fan Redundancy | Fan | Nominal | N/A | 'Fully Redundant'
49 | PS1 Status | Power Supply | Critical | N/A | 'Presence detected' 'Power 
Supply input lost (AC/DC)'
50 | PS2 Status | Power Supply | Nominal | N/A | 'Presence detected'
51 | ACPI State | System ACPI Power State | Nominal | N/A | 'S0/G0'
52 | Button | Button/Switch | Nominal | N/A | 'OK'
56 | Processor 1 Stat | Processor | Nominal | N/A | 'Processor Presence 
detected'
57 | Processor 2 Stat | Processor | Nominal | N/A | 'Processor Presence 
detected'
58 | PCIe Link0 | Critical Interrupt | Nominal | N/A | 'OK'
59 | PCIe Link1 | Critical Interrupt | Nominal | N/A | 'OK'
60 | PCIe Link2 | Critical Interrupt | Nominal | N/A | 'OK'
61 | PCIe Link3 | Critical Interrupt | Nominal | N/A | 'OK'
62 | PCIe Link4 | Critical Interrupt | Nominal | N/A | 'OK'
63 | PCIe Link5 | Critical Interrupt | Nominal | N/A | 'OK'
64 | PCIe Link6 | Critical Interrupt | Nominal | N/A | 'OK'
65 | PCIe Link7 | Critical Interrupt | Nominal | N/A | 'OK'
66 | PCIe Link8 | Critical Interrupt | Nominal | N/A | 'OK'
67 | PCIe Link9 | Critical Interrupt | Nominal | N/A | 'OK'
68 | PCIe Link10 | Critical Interrupt | Nominal | N/A | 'OK'
69 | PCIe Link11 | Critical Interrupt | Nominal | N/A | 'OK'
70 | PCIe Link12 | Critical Interrupt | Nominal | N/A | 'OK'
71 | PCIe Link13 | Critical Interrupt | Nominal | N/A | 'OK'
76 | CPU Popul Error | Processor | Nominal | N/A | 'OK'
77 | DIMM 1A | Slot/Connector | Nominal | N/A | 'Slot/Connector Device 
installed/attached'
79 | DIMM 1B | Slot/Connector | Nominal | N/A | 'Slot/Connector Device 
installed/attached'
81 | DIMM 1C | Slot/Connector | Nominal | N/A | 'Slot/Connector Device 
installed/attached'
83 | DIMM 1D | Slot/Connector | Nominal | N/A | 'Slot/Connector Device 
installed/attached'
address@hidden:~$ 


On Mon, 2010-06-21 at 09:32 -0700, Al Chu wrote:
> Hi Werner,
> 
> > Does anybody know whether one of the other tools like freeipmi or
> > impiutil has some functionality like this?
> 
> In FreeIPMI, there is a tool called ipmimonitoring that I believe does
> what you're asking for (output condensed for readability below).
> 
> 18 | Fan1            | Nominal  | 14500.00   | RPM   | 'OK'
> 19 | Fan2            | Nominal  | 14300.00   | RPM   | 'OK'
> 20 | Fan3/CPU2       | Nominal  | 14300.00   | RPM   | 'OK'
> 21 | Fan4/CPU1       | Nominal  | 13900.00   | RPM   | 'OK'
> 22 | Fan5            | Nominal  | 14000.00   | RPM   | 'OK'
> 23 | Fan6            | Nominal  | 14000.00   | RPM   | 'OK'
> 24 | Fan7/CPU3       | Critical | 0.00       | RPM   | 'At or Below (<=) 
> Lower Non-Recoverable Threshold'
> 25 | Fan8/CPU4       | Critical | 0.00       | RPM   | 'At or Below (<=) 
> Lower Non-Recoverable Threshold'
> 26 | Fan9            | Critical | 0.00       | RPM   | 'At or Below (<=) 
> Lower Non-Recoverable Threshold'
> 27 | Power Supply 1  | Nominal  | N/A        | N/A   | 'Presence detected'
> 28 | Power Supply 2  | N/A      | N/A        | N/A   | N/A
> 
> So for this example, fans with normal RPM are "Nominal", out of range is
> "Critical", and the power supply that doesn't exist is "N/A".  There is
> also a "Warning" output when the situation is appropriate.
> 
> I can speak more of it, but it's probably not best on this mailing.
> Feel free to ping me on the FreeIPMI mailing list.
> 
> Al
> 
> On Mon, 2010-06-21 at 06:08 -0700, Werner Fischer wrote:
> > Hi ipmitool developers,
> > 
> > I thought about the problem regarding monitoring discrete IPMI sensors,
> > that Brian reported back in April:
> > http://*www.*mail-archive.com/address@hidden/msg01472.html
> > 
> > I did some in-depth testing and looked how the current VMware ESXi 4.0
> > reports different states of discrete IPMI sensors.
> > 
> > I tested two example scenarios with an Intel SR2500 server:
> > 
> > Test case 1:
> >   * Power Supply 2 removed
> >   * Chassis cover removed
> >   * VMware reports: http://*www.*wefi.net/shared/sr2500-example-1.png
> > 
> > Test case 2:
> >   * Power Supply 2 present, but power cable removed
> >   * Vmware reports: http://*www.*wefi.net/shared/sr2500-example-2.png
> > 
> > (Below you find some example ipmitool outputs for these two cases).
> > 
> > The current IPMI specification lists possible sensor-specific-offsets
> > for each sensor type in table 42-3, Sensor Type Codes.
> > 
> > To me it seems that VMware uses some mapping, which defines which
> > offsets (assertions/deassertions) cause a warning or an alarm,
> > e.g. an offset for the event "General Chassis Intrusion" for a Physical
> > Security sensor (sensor type code 05h) leads to status "Warning".
> > 
> > So my request:
> >       * introduce some new option for ipmitool (something like "ipmitool
> >         get-server-status") where ipmitool uses such kind of mapping,
> >         too. We could define which offsets/assertions should cause a
> >         warning. In this way an end-user would have an easy way to
> >         quickly find out whether or not everything is ok with his
> >         hardware...
> > 
> > Currently using e.g. "ipmitool sdr elist all" returns "ok" for sensor
> > states like "General Chassis Intrusion" (see below)
> > 
> > What do you think?
> > Any other ideas how we could accomplish that?
> > Does anybody know whether one of the other tools like freeipmi or
> > impiutil has some functionality like this?
> > 
> > best regards,
> > Werner
> > 
> > PS: Here are the outputs of ipmitool for this:
> > 
> > Test case 1:
> >         address@hidden:~$ ipmitool -I lan -H 192.168.1.211 -U monitor -L 
> > user sdr elist all | grep -i "PS"
> >         Password: 
> >         PS1 AC Current   | 78h | ok  | 10.1 | 0.93 Amps
> >         PS2 AC Current   | 79h | ns  | 10.2 | No Reading
> >         PS1 +12V Current | 7Ah | ok  | 10.1 | 16 Amps
> >         PS2 +12V Current | 7Bh | ns  | 10.2 | No Reading
> >         PS1 +12V Power   | 7Ch | ok  | 10.1 | 192 Watts
> >         PS2 +12V Power   | 7Dh | ns  | 10.2 | No Reading
> >         PS1 Status       | 70h | ok  | 10.1 | Presence detected
> >         PS2 Status       | 71h | ok  | 10.2 | 
> >         address@hidden:~$ ipmitool -I lan -H 192.168.1.211 -U monitor -L 
> > user sdr elist all | grep -i "Physical Scrty"
> >         Password: 
> >         Physical Scrty   | 05h | ok  | 23.1 | General Chassis intrusion
> >         address@hidden:~$ ipmitool -I lan -H 192.168.1.211 -U admin raw 
> > 0x04 0x2d 0x70
> >         Password: 
> >         Data length = 1
> >          00 c0 01 00
> >         address@hidden:~$ ipmitool -I lan -H 192.168.1.211 -U admin raw 
> > 0x04 0x2d 0x71
> >         Password: 
> >         Data length = 1
> >          00 c0 00 00
> >         address@hidden:~$ ipmitool -I lan -H 192.168.1.211 -U admin -P 
> > relation sdr get "Physical Scrty"
> >         Sensor ID              : Physical Scrty (0x5)
> >          Entity ID             : 23.1 (System Chassis)
> >          Sensor Type (Discrete): Physical Security
> >          States Asserted       : Physical Security
> >                                  [General Chassis intrusion]
> >          Assertion Events      : Physical Security
> >                                  [General Chassis intrusion]
> >          Assertions Enabled    : Physical Security
> >                                  [General Chassis intrusion]
> >                                  [System unplugged from LAN]
> >          Deassertions Enabled  : Physical Security
> >                                  [General Chassis intrusion]
> >                                  [System unplugged from LAN]
> > 
> > Test case 2:
> >         address@hidden:~$ ipmitool -I lan -H 192.168.1.211 -U monitor -L 
> > user sdr get "PS2 Status"
> >         Password: 
> >         Sensor ID              : PS2 Status (0x71)
> >          Entity ID             : 10.2 (Power Supply)
> >          Sensor Type (Discrete): Power Supply
> >          States Asserted       : Power Supply
> >                                  [Presence detected]
> >                                  [Power Supply AC lost]
> >          Assertion Events      : Power Supply
> >                                  [Presence detected]
> >                                  [Power Supply AC lost]
> >          Assertions Enabled    : Power Supply
> >                                  [Presence detected]
> >                                  [Failure detected]
> >                                  [Predictive failure]
> >                                  [Power Supply AC lost]
> >                                  [Config Error: Vendor Mismatch]
> >                                  [Config Error: Revision Mismatch]
> >                                  [Config Error: Processor Missing]
> >                                  [Config Error]
> >          Deassertions Enabled  : Power Supply
> >                                  [Presence detected]
> >                                  [Failure detected]
> >                                  [Predictive failure]
> >                                  [Power Supply AC lost]
> >                                  [Config Error: Vendor Mismatch]
> >                                  [Config Error: Revision Mismatch]
> >                                  [Config Error: Processor Missing]
> >                                  [Config Error]
> >         
> >         address@hidden:~$ ipmitool -I lan -H 192.168.1.211 -U monitor -L 
> > user sdr elist all | grep -i "PS"
> >         Password: 
> >         PS1 AC Current   | 78h | ok  | 10.1 | 0.93 Amps
> >         PS2 AC Current   | 79h | ok  | 10.2 | 0.12 Amps
> >         PS1 +12V Current | 7Ah | ok  | 10.1 | 16 Amps
> >         PS2 +12V Current | 7Bh | ok  | 10.2 | 0 Amps
> >         PS1 +12V Power   | 7Ch | ok  | 10.1 | 192 Watts
> >         PS2 +12V Power   | 7Dh | ok  | 10.2 | 0 Watts
> >         PS1 Status       | 70h | ok  | 10.1 | Presence detected
> >         PS2 Status       | 71h | ok  | 10.2 | Presence detected, Power 
> > Supply AC lost
> >         address@hidden:~$ ipmitool -I lan -H 192.168.1.211 -U admin raw 
> > 0x04 0x2d 0x71
> >         Password: 
> >         Data length = 1
> >          00 c0 09 00
> >         address@hidden:~$
> > 
> > 
> -- 
> Albert Chu
> address@hidden
> Computer Scientist
> High Performance Systems Division
> Lawrence Livermore National Laboratory
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]