freeipmi-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freeipmi-devel] clean exit of external process when conmand shuts d


From: Brian Lambert
Subject: Re: [Freeipmi-devel] clean exit of external process when conmand shuts down
Date: Tue, 10 Jan 2012 23:14:15 -0500 (EST)


I did another test, and have attached debug output.

First, I rebooted the BMC (Dell iDRAC6) to make sure there were no sessions active.

I then established an initial SOL session, using the following command:
  ./ipmiconsole -h n003-bmc -u root -p calvin -W solpayloadsize 
--serial-keepalive

So far, so good.

Instead of killing the first session, I left it active and tried to start a second session using the same command. That failed as expected, with a "BMC Error" message. Debug output from that first reconnect attempt is attached in ipmiconsole-reconnect1.txt.

I then tried to deactivate the existing session using the command:
  ./ipmiconsole -h n003-bmc -u root -p calvin -W solpayloadsize  
--serial-keepalive --deactivate

That command completed without error, but the original session was still active and responding to keystrokes. Debug output from that attempt is attached in ipmiconsole-deactivate1.txt.

I then tried to activate a new session a second time. It failed with the same error message as the first reconnect attempt. Debug output from the second attempt is in ipmiconsole-reconnect2.txt.

Thanks for your help. Let me know if you need further details or want me to try anything else.

thanks,
Brian


On Sun, 8 Jan 2012, Al Chu wrote:

Hi Brian,

I've moved the IPMI portion of this thread to freeipmi-devel, since it's
a bit more appropriate for this mailing list.

To start a session, I can use the following FreeIPMI command:
./ipmiconsole -h n003-bmc -u root -p calvin -W solpayloadsize --serial-
keepalive

I can quit out of that session using the &. escape sequence, and
reconnect right away.  But if I 'kill -9' that process, I get a
"[error received]: BMC Error" message when I try to connect with
another ipmiconsole command.

This indicates an unexpected error code along the way.  ipmiconsole
probably noticed that the previous SOL session was activated and tried
to deactivate it, with some error occurring at some point.  Could you
send the --debug output of ipmiconsole when you try to reconnnect?

This is the same error message I get
when trying the connect when another session is already active.  If I
then issue the command:
./ipmiconsole -h n003-bmc -u root -p calvin -W solpayloadsize --serial-
keepalive --deactivate
This completes without error, but I still can't reconnect to the
serial console.

Can you give me the --debug output of the later connect attempt?  I'd
like to see why it can't connect again.

I get similar results when using ipmitool.  In that case, when I try
to reconnect, I get:
#ipmitool -U root -P calvin -H n003-bmc -I lanplus sol activate
Info: SOL payload already active on another session

If I try to deactivate the existing session, I get:
# ipmitool -U root -P calvin -H n003-bmc -I lanplus sol deactivate
Info: SOL payload already de-activated

I don't know the exact test situation you're trying, but you could be
racing a bit in some of these scenarios.  When you kill the previous
session with "kill -9", the server/BMC does not immediately end the
IPMI/SOL session.  It lasts for awhile longer until the server/BMC
eventually times out.  So that can explain why your first activate
attempt indicates the session is already activated, but it's deactivated
by the time your try to deactivate.

Once it's in this state, the only thing I've been able to do to regain
access to the serial console is reboot the BMC or wait for the session
to time out.

I have the same experience when connecting to Dell iDRAC5 and iDRAC6,
both running the latest firmware.  Al, if you'd like more information
or debug output from the freeipmi tools I'd be happy to provide it.

Would like to get to the bottom of this.

Al


On Sun, 2012-01-08 at 20:13 -0800, lambert wrote:
After some additional experimentation, it looks like a direct ssh to
the Dell blade iDRAC (BMC) followed by a command to activate the
serial connection may be the way to go with these.  I found that a
SIGKILL to the ssh session was sufficient to close the serial console
session, such that I could start another session with out needing to
wait several minutes for the old session to time out.

I still need to do some more testing, but Chris you may want to wait
before you spend too much time implementing the external process
cleanup coding.  If I get this approach working robustly, a clean
shutdown of the external process will be less important.


As for the IPMI SOL issues:

To start a session, I can use the following FreeIPMI command:
./ipmiconsole -h n003-bmc -u root -p calvin -W solpayloadsize --serial-
keepalive

I can quit out of that session using the &. escape sequence, and
reconnect right away.  But if I 'kill -9' that process, I get a
"[error received]: BMC Error" message when I try to connect with
another ipmiconsole command.  This is the same error message I get
when trying the connect when another session is already active.  If I
then issue the command:
./ipmiconsole -h n003-bmc -u root -p calvin -W solpayloadsize --serial-
keepalive --deactivate
This completes without error, but I still can't reconnect to the
serial console.

I get similar results when using ipmitool.  In that case, when I try
to reconnect, I get:
#ipmitool -U root -P calvin -H n003-bmc -I lanplus sol activate
Info: SOL payload already active on another session

If I try to deactivate the existing session, I get:
# ipmitool -U root -P calvin -H n003-bmc -I lanplus sol deactivate
Info: SOL payload already de-activated

Once it's in this state, the only thing I've been able to do to regain
access to the serial console is reboot the BMC or wait for the session
to time out.

I have the same experience when connecting to Dell iDRAC5 and iDRAC6,
both running the latest firmware.  Al, if you'd like more information
or debug output from the freeipmi tools I'd be happy to provide it.

thanks,
Brian

On Jan 7, 6:06 pm, Al Chu <address@hidden> wrote:
Thanks also for the FreeIPMI link.  That list confirms the the issue
I've been seeing with the Dell iDRACs not responding to the sol
deactivate.  I've made Dell aware of the issue, but don't know if they
have any plans to fix it.

When you do a "sol deactivate" does the original ipmitool session just
hang forever?  I imagine you're hitting a scenario where the original
IPMI/SOL session cannot do SOL anymore, but can send/recv IPMI packets.
The IPMI session can send IPMI keepalive packets and stay happy all day
long, but no SOL traffic will ever be received.  The only way to get a
timeout is to send SOL data (i.e. type at prompt), so that the SOL data
transfer eventually times out.

I added a "serial keepalive" into ipmiconsole/libipmiconsole to try and
deal w/ this situation.  As the name suggests, you "keepalive" a session
using SOL data instead of IPMI data so that the original sessions will
eventually time out (and exit, which is the end goal).  In FreeIPMI's
ipmiconsole this is enabled w/ the "--serial-keepalive" option.

I do believe ipmitool has a similar option "usesolkeepalive" (or
something to that affect).  It may be worth trying too.

Al



On Fri, 2012-01-06 at 20:43 -0800, lambert wrote:
I stand corrected, my second example does appear to work in regards to
trapping the signal while in interact mode.  Not sure what I was doing
wrong the other day.

So I fleshed-out the code in the trap to have it log out of the cmc
and exit out of the expect script upon receiving a SIGHUP, and that
appears to work well.  It can't trap a SIGKILL so it will take a
modification to conman, as you suggested, to have an option for
sending different signal types.  Another approach would be to send a
SIGHUP to all external processes by default, followed by a short wait,
and then a SIGKILL to clean up any stragglers.  I can try playing with
that some, if you want to point me toward the relevant routine.

Thanks also for the FreeIPMI link.  That list confirms the the issue
I've been seeing with the Dell iDRACs not responding to the sol
deactivate.  I've made Dell aware of the issue, but don't know if they
have any plans to fix it.

Thanks.

On Jan 6, 3:13 am, Chris Dunlap <address@hidden> wrote:
As for IPMI SOL connections, ConMan uses FreeIPMI.  I know Al Chu
(FreeIPMI maintainer) has encountered bugs in several vendor
implementations, and has implemented various workarounds when possible:

http://www.gnu.org/software/freeipmi/freeipmi-bugs-issues-and-workaro...

You could try the internal IPMI support to see if FreeIPMI is better
able to cope with the Dell blades.

conmand connects to an external process via a fork/exec, duping the
ends of the child's socketpair onto stdin/stdout.  It disconnects
from the process by closing its side of the socketpair and sending
a sigkill to the associated pid.

The signal handler approach seems cleaner, but only if we're able
to handle signals within the interact block.  Just playing around at
the shell, this seems to work:

  #!/usr/bin/expect --
  spawn $env(SHELL)
  trap {send_user " SIG[trap -name] "} {USR1 USR2}
  interact

I'm not sure why your 2nd example doesn't work.  I'll try to look at
this some more in the next few days.

-Chris

On Thu, 2012-01-05 at 07:56am PST, lambert wrote:

What I'm trying to do in this case is issue the following commands to
connect to a virtual serial console, on a Dell blade, through the
chassis management controller.

ssh <cmc host>
connect -m server-<n>

At this point I would issue an interact command in the expect script.

Then, to close the connection requires sending a ^\ to close the
serial connection, followed by an 'exit' to exit out of the cmc ssh
connection.

Note that the Dell blades do support IPMI SOL.  I'm currently using an
external script to drive ipmitool (hadn't realized conman now supports
ipmi sol connections internally).  It's working for the most part, but
I'm hitting the same problem in that 1) I can't issue an 'sol
deactivate' to close the connection when conmand shuts down and 2) The
Dell BMCs don't appear to honor the 'sol deactivate' command anyway.

I'm having some general reliability issues with using IPMI SOL on the
Dell blades, so thought I'd try going through the above approach of
establishing a connection by way of the cmc.

I was thinking along the lines of a signal handler.  How does conman
currently execute the external process, is it just a 'system' call?
Just wondering if the external process is already receiving a SIGKILL
when conmand shuts down.

Just now I experimented with creating a 'trap' inside my expect
script.  It works, up until the interact block.  Once the interact
command is executed, the signal handler is no longer being run:

This works ( I see 'Ouch!' printed with each SIGUSR1 signal):

set timeout -1
spawn /bin/sh
match_max 100000
send -- "ssh cmc1\r"
expect -exact "ssh cmc1\r
address@hidden's password: "
send -- "#####\r"
expect -gl "\$ "
trap {send_user "Ouch!"} SIGUSR1

But once I add the 'interact' command, the signal handler stops
working, and a SIGUSR1 just causes the expect script to exit:
set timeout -1
spawn /bin/sh
match_max 100000
send -- "ssh cmc1\r"
expect -exact "ssh cmc1\r
address@hidden's password: "
send -- "#####\r"
expect -gl "\$ "
trap {send_user "Ouch!"} SIGUSR1
interact

Thanks.

On Jan 5, 3:01=A0am, Chris Dunlap <address@hidden> wrote:
No, ConMan currently has no mechanism to trigger an external process
for cleanup before exiting.

One possibility would be to have config keywords to specify, say,
an ExecExitStr and ExecExitDelay. =A0On exit, conmand would write
the ExecExitStr string into the associated console byte stream,
after which it would wait ExecExitDelay seconds before terminating.
The expect script could specify this ExecExitStr pattern in its
interact block, and upon matching it, perform the necessary sends &
expects to prepare the remote console. =A0The ExecExitDelay would give
it time to run. =A0One downside to this approach is that there is no
way to prevent a connected user from typing the ExecExitStr pattern,
thereby triggering the interact block in the expect script.

Another possibility would be to specify a signal handler within
the expect script, and conmand could signal the associated pid
with an ExecExitSigNum signal before waiting ExecExitDelay seconds
to terminate. =A0But I'd have to do some experimentation to see if I
could craft an appropriate signal handler for an expect script.

Can you elaborate on what you would like to do in order to cleanly
close such a connection?

-Chris

On Wed, 2012-01-04 at 02:41pm PST, lambert wrote:

Is there a way to trigger a clean exit of an external console process,
when the conman daemon is shut down? =A0Say I'm using the ssh.exp
script, when the conman daemon is shut down (/etc/init.d/conman stop),
I'd like to have the ssh.exp script issue commands to cleanly close
the connection.

I'm trying to work around a problem with some Dell blades where if the
virtual serial console connection is not terminated cleanly, I have to
wait several minutes or reboot the BMC in order to regain access.

thanks.

--
Albert Chu
address@hidden
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
--
Albert Chu
address@hidden
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory

Attachment: ipmiconsole-reconnect2.txt
Description: Text document

Attachment: ipmiconsole-deactivate1.txt
Description: Text document

Attachment: ipmiconsole-reconnect1.txt
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]