freeipmi-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freeipmi-users] ipmiconsole BMC Implementation with x38ml


From: Al Chu
Subject: Re: [Freeipmi-users] ipmiconsole BMC Implementation with x38ml
Date: Fri, 26 Sep 2008 09:48:05 -0700

Hey Kimitoshi,

On Sat, 2008-09-27 at 01:08 +0900, ktaka wrote:
> Hi Al,
> 
> > Great to hear.  Could you give me details on your motherboard, or point
> > me to a webpage with it?  That way I can update the workarounds
> > documentation to include this motherboard.
> 
> Sure.
> The system I'm using is, Intel's sr1520ml(twin motherboard server)
> http:// www. intel.com/products/server/systems/sr1520ml/sr1520ml-overview.htm
> 
> The motherboard on this is, x38ml
> http:// www. 
> intel.com/Products/Server/Motherboards/X38ML/X38ML-specifications.htm
> 
> This board has on board BMC with support for IPMI 2.0, shared nic for
> IPMI out bound connection.

Thanks for the information.  I've updated the docs.  This will be in the
next release (I will probably release today sometime).

> >> Sol connection seems silently disconnected from the target host through
> >> the booting process.
> > 
> > I don't think this is that uncommon.  
> 
> Yep. I often saw the disconnected connection case, when I was using
> supermicro servers, a few years ago.
> But nowadays that doesn't happen so often with them. It's stable. Your
> tool works like a charm, and never disconnect most of the time, unless I
> had the bad connections or misconfigured bonding where IPMI MAC address
> hops eth0 to eth1.
> 
> Maybe I should have thanked you for developping such a great tool before
> I asked the question.

Thanks for your comments.  You're welcome for the tools/project.

> I just wish freeipmi worked with this new M/B, as the way it did with my
> many supermicro servers.
> 
> > Do you happen to be diskless
> > booting?  One manufacturer told me that there are lines of ethernet
> > cards that run out of memory while diskless booting.  So all IPMI
> > traffic stops during that time.
> 
> Yes, I'm pxe booting. That could be one reason.
> 
> > I can imagine other scenarios where the ethernet is gone during boot.
> > Tools that were connected via SOL will either hang or eventually
> > timeout.
> >
> >> However do you have any idea on what's causing this, and how to
> >> correct this behavior?
> >
> > I don't know of a way to get around it.  From the 'ipmiconsole' side,
> > all I can do is read/send SOL packets.  When too many packets get lost,
> > or I get errors from the motherboard, etc. eventually I need to give up.
> > On one motherboard I tested with, during boot (it seemed) a large number
> > of SOL packets were dropped by the motherboard, and when SOL was alive
> > again, the sequence numbers of the newly sent packets were incremented
> > by a large number.  Eventually I need to give up b/c all the sequence
> > numbers are out of whack.
> 
> Well I think I'm getting closer now. Thank you.
> After a bunch of experiments, I came to this conclusion:
> This M/B(or BMC) seems stop sending sol data when the nic link goes down
> while it's sending the sol data, and never send the data again even if
> nic link is back.
> 
> Quick tests I did while I was writing this email are,
> 
> 1. Boot up the linux without "console=ttyS0,19200n8" option so that no
> kernel message is displayed in sol connection.
> 
> (1) Disconnect/Reconnect LAN cable while seeing "vmstat 1" through sol.
> -> The sol stopped.
> (2) Disconnect/Reconnect LAN cable while the sol is connected but
> nothing continuous is displayed. -> The sol continued to work after
> reconnecting the cable.
> 
> 2. Boot up the linux with "console=ttyS0,19200n8" option so that I can
> see the kernel message through sol.
> (3) Disconnect/Reconnect LAN cable. -> The sol stopped.
> (4) "modprobe -r igb", "modprobe igb", i.e. unloading/loding the nic
> driver(from tty1). The nic link goes down for a couple of seconds. ->
> The sol stopped.
> 
> Here are the rules of thumb for this motherboard:
> 1. Try not to use the first nic which is shared with IPMI connection for
> pxe booting.(Because while displaying the pxe rom message, the link goes
> down for a second.)
> 2. Try not to use serial console in order to see the kernel message, i.e
> avoid " console=ttyS0,19200n8" for kernel command line option.(Because
> when nic is initialized in the boot process, the link goes down for a
> moment.)
> 
> This way I think I can avoid "seemed left disconnected" situation for
> this motherboard most of the time now.
> 
> Do you think I'm doing right?
>
> > Are you seeing 'ipmiconsole' hang forever?  Or does it eventually get an
> > error or timeout?  It atleast shouldn't hang forever.  If it hangs
> > forever, do you think you could give me a --debug output of that
> > particular situation.  (Send as an attachment, since I'm sure the --
> > debug output will be very long.)
> 
> The "hang" seems forever. It seems to me that ipmiconsole "thinks" it's
> not disconnected and continues to send whatever input through terminal,
> while the target host never send back any data again.
> 
> I think I can send you --debug output when I did the followings:
> 1. Make sol connection to already logged in console.
> 2. Issue date command to see if it's alive.
> 3. In tty0, issue "modprobe igb" to make sol "hang".
> 4. Issue "sleep 1000" followed by Ctrl+C.
> 5. Disconnect sol by hitting "&.".
> 
> Here is stdout out put:
> 
> x60:~# /usr/ccmp/sbin/ipmiconsole -W authcap,solpayloadsize -u rt -p rt
> -h 192.168.20.116 --debug 2>/tmp/debug
> [SOL established]
> date
> Sat Sep 27 00:52:24 JST 2008
> usb:~# Intel(R) Gigabit Ethernet Network Driver - version 1.0.8-k2
> Copyright (c) 2007 Intel Corporatio
> [closing the connection]
> 
> Attached is the gzipped --debug output.
> 
> I hope this is what you want.

Yup, it's exactly what I want.  Unfortunately, I don't know how to deal
with the problem (atleast right now).

As you stated in your experiments above, it seems that the remote
motherboard/BMC has kept the SOL session alive, however the session is
somewhat "disconnected" from the actual console.  So the data you type
on the console is just thrown away and meaningless.  However, the
BMC/SOL session continues to respond saying "I received your data".
(You'll notice in the dump that the "SOL BMC to Remote Console" packets
always respond happily, but never send character data back along with
it.)

>From the 'ipmiconsole' side, it just reads/writes SOL data.  As long as
it sends SOL data and gets response that the data was received, it
continues to believe everything is fine.  It doesn't (and shouldn't)
care that the other side doesn't want to send it any more data.

I could perhaps add some code for some type of epic "if you don't
receive any data from the other side in 5 minutes, disconnect", but that
seems to be a bad idea in general.  Consoles can sit idle for hours.
I'll have to think about whether there's anything I can do from the
ipmiconsole side.

Obviously, you should report to your vendor about this issue too.  To
put pressure on Intel to have something better in the future.  That is
the core of the problem.

Thanks,
Al

> Otherwise, please let me know.
> Thank you.
> 
-- 
Albert Chu
address@hidden
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory





reply via email to

[Prev in Thread] Current Thread [Next in Thread]