freeipmi-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freeipmi-users] Decoding ram errors on supermicro


From: Tom Hetmer
Subject: Re: [Freeipmi-users] Decoding ram errors on supermicro
Date: Wed, 05 Dec 2018 03:38:24 +0100

Alright, added to github.

Here's the output from bmc-info for that particular board.
Product ID            : 2201
[Mon Dec  3 12:08:13 2018] DMI: Supermicro X10DRH LN4/X10DRH-CLN4, BIOS 2.0 
01/30/2016


I guess you'll support it based on the product ID?
So if there are any other (X10) boards with different product ID but the same 
SEL output I'll have to send it again, correct?


I have all kinds of numbers on other machines,
ie. 
X10DRW-E => 2148
X11SPi-TF => 2369
X10SLL-F => 2049
X10DRL-i => 2097
X11DDW-NT => 2407
X10SLH-F/X10SLM+-F/X10SLH-F/X10SLM+-F => 2051


and so on.. I think we have at least 1/4 of the boards they manufacture.
X9s are under 2000, X11 seems to be 23xx. But that's maybe too much reverse 
engineering to you ;)
I can try to ping them and ask about details but I got no offical contact with 
Supermicro.


Best,
Tom Hetmer


CDN77 Operations
address@hidden / +44 (0) 20 3514 2399 / www.cdn77.com

----- Původní zpráva ----- 
> Odesilatel: "Albert Chu" <address@hidden> 
> Příjemce: "Tom Hetmer" <address@hidden>, address@hidden 
> Datum: 12/04/18 19:40 
> Předmět: Re: [Freeipmi-users] Decoding ram errors on supermicro 
> 
> On Tue, 2018-12-04 at 11:39 +0100, Tom Hetmer wrote:
> > Sure. It seems there's a similar ticket
> > already: https://github.com/chu11/freeipmi-mirror/issues/19
> 
> Ahh, if you could, update it with info from ipmitool / ipmiutil.  I was
> reluctant to add support based on reverse engineering.  But if other
> tools have "official" interpretations from Supermicro, I'm more
> confident in the addition.
> 
> > Yep, that's the code. ipmitool and a few others decode it too.
> > 
> > 
> > We have a *lot* of Supermicros so I can help with testing if needed -
> > but we don't get that much CRC errors though :)
> 
> The one thing I'll need is product ID numbers (you can get from bmc-
> info) and the name of the product.  This goes into the documentation
> and some of the code.
> 
> Thanks,
> 
> Al
> 
> > So I guess we'd have to wait till one pops up. But I hope the 'ver 2'
> > method from ipmiutil works fine.
> > We used ipmitool in our monitoring before and it was accurate but
> > slow, that's why I rewrote it all to use freeipmi.
> > 
> > 
> > Thanks!
> > 
> > 
> > Best,
> > Tom Hetmer
> > 
> > 
> > CDN77 Operations
> > address@hidden / +44 (0) 20 3514 2399 / www.cdn77.com
> > 
> > ----- Původní zpráva ----- 
> > > Odesilatel: "Albert Chu" <address@hidden> 
> > > Příjemce: "Tom Hetmer" <address@hidden>, address@hidden
> > > .org 
> > > Datum: 12/03/18 21:06 
> > > Předmět: Re: [Freeipmi-users] Decoding ram errors on supermicro 
> > > 
> > > Hi Tom,
> > > 
> > > Thanks for the pointer to ipmiutil's code.  I assume you found this
> > > comment:
> > > 
> > > ---
> > >       /* ver 2 method: 2A 80 = P1_DIMMB1
> > > */                                                                 
> > >                            
> > >           /* SuperMicro
> > > says:                                                              
> > >                                             
> > >            *  pair: %c (data2 >> 4) + 0x40 + (data3 & 0x3) * 3,
> > > (='B')                                                             
> > >     
> > >            *  dimm: %c (data2 & 0xf) +
> > > 0x27,                                                              
> > >                              
> > >            *  cpu:  %x (data3 & 0x03) +
> > > 1);                                                                
> > >                             
> > >            */                       
> > > ---
> > > 
> > > I can definitely add it to my todo list.
> > > 
> > > Would you mind writing up an issue on github here?
> > > 
> > > https://github.com/chu11/freeipmi-mirror
> > > 
> > > Al
> > > 
> > > On Mon, 2018-12-03 at 17:55 +0100, Tom Hetmer wrote:
> > > > Hi, 
> > > > 
> > > > it'd be good if freeipmi supported decoding the supermicro ECC
> > > > errors.
> > > > 
> > > > 
> > > > Manufacturer: Supermicro
> > > > Product Name: X10DRH LN4
> > > > eg.
> > > > freeipmi
> > > > 1,Dec-01-2018,06:37:53,Sensor #0,Memory,Critical,Uncorrectable
> > > > memory
> > > > error ; OEM Event Data2 code = 3Ah ; OEM Event Data3 code = 81h
> > > > 
> > > > 
> > > > web interface
> > > > 1 | 12/01/2018 | 06:37:53 | Memory | Uncorrectable ECC
> > > > (@DIMMG1(CPU2)) | Asserted
> > > > 
> > > > 
> > > > something like this worked for me (stolen from ipmiutil)
> > > > 
> > > > 
> > > > $cpu = ($data3 & 0x03) + 1;
> > > > 
> > > > 
> > > > $NPAIRS = 26;
> > > > $rgpairs = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
> > > > 
> > > > 
> > > > $bdata = "0x".$data2.$data3;
> > > > $bdata = hexdec($bdata);
> > > > $pair = (($bdata & 0xF0) >> 4) - 1;
> > > > 
> > > > 
> > > > if ($pair < 0) $pair = 0;
> > > > if ($pair > $NPAIRS) $pair = $NPAIRS - 1;
> > > > 
> > > > 
> > > > $pair = $rgpairs[$pair - 1];
> > > > 
> > > > 
> > > > $dimm = $bdata & 0x0F;
> > > > 
> > > > 
> > > > $dimm may be incorrect as the original code decrements 9, but on
> > > > that
> > > > board it was wrong so i changed it to get the right result -
> > > > we'll
> > > > see if it keeps getting the right values.
> > > > 
> > > > Best,
> > > > Tom Hetmer
> > > > 
> > > > 
> > > > CDN77 Operations
> > > > address@hidden / +44 (0) 20 3514 2399 / www.cdn77.com
> > > > 
> > > > _______________________________________________
> > > > Freeipmi-users mailing list
> > > > address@hidden
> > > > https://lists.gnu.org/mailman/listinfo/freeipmi-users
> > > 
> > > -- 
> > > Albert Chu
> > > address@hidden
> > > Computer Scientist
> > > High Performance Systems Division
> > > Lawrence Livermore National Laboratory
> > 
> > _______________________________________________
> > Freeipmi-users mailing list
> > address@hidden
> > https://lists.gnu.org/mailman/listinfo/freeipmi-users
> -- 
> Albert Chu
> address@hidden
> Computer Scientist
> High Performance Systems Division
> Lawrence Livermore National Laboratory



reply via email to

[Prev in Thread] Current Thread [Next in Thread]