lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lynx-dev Problem with ^Z suspending - so far...


From: Klaus Weide
Subject: lynx-dev Problem with ^Z suspending - so far...
Date: Wed, 27 Jan 1999 10:38:56 -0600 (CST)

I have done some tracing of ncurses-lynx, and some other
ncurses-compiled programs, together with three different Bournish
shells.  I am trying to summarize my conclusions here.  Let me know,
those who follow this thread, this makes sense, or if you think some of it
is wrong.

In the following, bash, ksh, ash, and ncurses are used to refer to the
following Debian packages:

||/ Name            Version        Description
+++-===============-==============-============================================
ii  bash            2.01.1-4.1     The GNU Bourne Again SHell
ii  pdksh           5.2.13-4       A public domain version of the Korn shell
ii  ash             0.3.4-6        NetBSD /bin/sh
ii  libncurses4     4.2-3          Shared libraries for terminal handling

running on

   $ uname -a
   Linux kweide 2.0.33 #1 Wed May 27 02:37:40 CDT 1998 i586 unknown

With "tty state" or "tty mode" below I refer to the various termios (or
whatever) flags which can be examined with stty.

Problems
========
I classify my observations into three distinct problems; two have
already been talked about, while the third one is new (I hinted at it
when I mentioned very strange behavior under ksh).

Preconditions for all three:
- (various "normal case" assumptions, like stdin/stdout is the
  "controlling tty", ^Z generates SIGTSTP, children don't change
  process group etc.)
- An application process (lynx) invoked from an interactive shell,
  with job control enabled, through an intervening process [typically
  a shell script without "exec", or a shell construct that involves a
  subshell like, in some shells, "( 1st-cmd; 2nd-cmd )" ].  That is,
  we have something like the following: (output from "ps jf")

   PPID   PID  PGID   SID TTY TPGID  STAT  UID   TIME COMMAND
      1  6258  6258  6258   4 15123  S     656   0:05 -bash
   6258 15123 15123  6258   4 15123  S     656   0:00  \_ sh ./runlynx.sh
  15123 15124 15123  6258   4 15123  S     656   0:00      \_ lynx

- User presses ^Z, the intermediate and the application process both
  get a SIGTSTP.  The intermediate process gets stopped, and the
  interactive shell notices it and does whatever it does when a job
  stops, before the application process's signal handler gets a chance
  to run.  Whether this actually occurs is not predictable.
  That is, for a brief time, we have

   PPID   PID  PGID   SID TTY TPGID  STAT  UID   TIME COMMAND
      1  6258  6258  6258   4  6258  S     656   0:05 -bash
   6258 15123 15123  6258   4  6258  T     656   0:00  \_ sh ./runlynx.sh
  15123 15124 15123  6258   4  6258  ?     656   0:00      \_ lynx

  the interactive shell has already written to the screen (a job status
  line and a prompt) and has "taken away" the tty from the job, and then
  the application (lynx) starts to handle its SIGTSTP.

The intermediate shell doesn't have any further significance, its only
role is to produce this situation by stopping first and mislead the
interactive shell.

Problem I - getting stopped twice
---------------------------------

Job gets suspended, fg results in a hung process (input isn't read).
Another ^Z, fg cycle usually recovers.

This happens when the application's SIGTSTP handler wants to write to the
screen (with "stty tostop" set) or update the tty state, _and_ SIGTTOU is
not ignored or blocked (or handled), _and_ there is not some other way by
which the SIGTSTP notices this situation (could be: SIGCONT handler).  The
write (if tostop) or ioctl produces SIGTTOU which stops the process
prematurely; when it is continued, it finishes up what it was doing, that
is it stops itself.

Problem II - wrong screen content
---------------------------------

Job gets suspended, but then screen content (including cursor position) is
somehow wrong.  For example prompt may appear in middle of screen (for
lynx -show_cursor).

This happens (under the Preconditions) whenever the SIGTSTP handler would
generate some output - it either arrives in the wrong order, or, in
connection with Problem I, not at all.

Let's denote with L the character output that is (or would be) generated by
the application's handler, and S the interactive shell's output (status and
prompt), and + means order in time or concatenation (Think L=lynx or less,
S=shell).  With ncurses, L is typically some escape sequences for
positioning the cursor on the last screen line, possibly in connection
with a rmcup string.

Wanted: L + S

Actual: S + L       (SIGTTOU ignored or blocked, or -tostop)
        S           (SIGTTOU stops, and tostop)

This is rather cosmetic, and probably unavoidable unless interactive shells
are changed to keep track of grandchildren, or a SIGTSTP handler is added
to the script that somehow makes it delay stopping until its child has
stopped.

Problem II' - wrong tty state in shell
 - - - - - - - - - - - - - - - - - - -

There is an other effect: when the application has reached the stopped
state, the tty flags may not be what they should.  We can understand this
the same way as Problem II by viewing function calls modifying the tty
state (tcsetattr etc. which translate into ioctl calls) as just another
kind of output.  Say l (ell) is what the application's SIGTSTP handler wants
to set, s is what the shell wants.  For simplicity, and as actually found
by tracing ncurses programs, assume that character output precedes ioctls.
Then, together with the character output:

Wanted: L + l + S + s

Actual: S + s + L + l      (SIGTTOU ignored or blocked)
        S + s + L          (SIGTTOU stops, and -tostop)
        S + s              (SIGTTOU stops, and tostop)

If I have to choose the lesser of three evils, I pick the middle one as
least undesirable: cursor-to-last-line and rmcup is done, but tty state
is set the right way for the shell prompt.  As it turns out, this isn't
a problem with bash or ksh in line-editing mode since they take care of
tty state on their own (see below), but S + s + L + l could be a
problem for ash.  But l is usually meant just to set the tty back into
the right mode (canonical) for the shell, so ash should be ok, too.
There does seem to be a remoining problem with this with ksh without
line-editing, but that may well be ksh's mistake.

Two concrete examples, from strace.  First let's look at less, a well-
behaved application.  It has its own SIGTSTP handler:

read(3, 0xbffff9cf, 1)                  = ? ERESTARTSYS (To be restarted)
--- SIGTSTP (Stopped) ---                     # handler gets entered now
sigaction(SIGTSTP, {0x8055230, [], SA_RESTART}, {0x8055230, [], SA_RESTART}) = 0
sigprocmask(SIG_SETMASK, [], [TSTP])    = 0
sigaction(SIGTTOU, {SIG_IGN}, {SIG_DFL}) = 0      # ignore SIGTTOU
write(1, "\33[32;1H\33[K", 10)          = 10      # this is L
ioctl(1, TCSETSW, {B38400 opost isig icanon echo ...}) = 0 # this is l
sigaction(SIGTTOU, {SIG_DFL}, {SIG_IGN}) = 0      # restore SIGTTOU
sigaction(SIGTSTP, {SIG_DFL}, {0x8055230, [], SA_RESTART}) = 0
getpid()                                = 15284
kill(15284, SIGTSTP)                    = 0       # stop this process!
--- SIGTSTP (Stopped) ---                         # now really stops
# would happen after resuming:
sigaction(SIGTSTP, {0x8055230, [], SA_RESTART}, {SIG_DFL}) = 0
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, TCSETSW, {B38400 opost isig -icanon -echo ...}) = 0

(I have edited the output, TCSETSW actually shows up as SNDCTL_TMR_STOP,
they're the same number which confuses strace a bit.  Also the "really stops"
is not what I can observe while tracing since strace doesn't let the process
really stop; it is what would happen if untraced.)

SIGTTOU is ignored during L + l output, so that Problem I can never occur.
This isn't using the blocking via sigaction's sa_mask as suggested by Kari
Hurtta and Bela, but has the same effect.  So S + s + L + l is the
combination of output.

Compare lynx: 

read(0, 0xbfffea73, 1)                  = ? ERESTARTSYS (To be restarted)
--- SIGTSTP (Stopped) ---                       # entering handler
ioctl(1, TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
sigprocmask(SIG_BLOCK, [ALRM], [TSTP])  = 0
write(1, "\33[32;1H\r", 8)              = 8                # this is L
ioctl(1, TCSETSW, {B38400 opost isig icanon echo ...}) = 0 # this is l
sigprocmask(SIG_UNBLOCK, [TSTP], NULL)  = 0
sigaction(SIGTSTP, {SIG_DFL}, {0x4003f0c0, [], SA_RESTART}) = 0
getpid()                                = 15124
kill(15124, SIGTSTP)                    = 0
--- SIGTSTP (Stopped) ---         # now really stops; later:
sigaction(SIGTSTP, {0x4003f0c0, [], SA_RESTART}, NULL) = 0
ioctl(1, TCFLSH, TCIFLUSH)              = 0
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
sigaction(SIGTSTP, {SIG_IGN}, {0x4003f0c0, [], SA_RESTART}) = 0
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=32, ws_col=100, ws_xpixel=0, ws_ypixel=0}) = 0
ioctl(1, TCSETSW, {B38400 opost isig -icanon -echo ...}) = 0
write(1, "\33[1;32r\33[39;49m\33[0;10m\33["..., 2304) = 2304
sigaction(SIGTSTP, {0x4003f0c0, [], SA_RESTART}, NULL) = 0
sigprocmask(SIG_SETMASK, [TSTP], NULL)  = 0
sigreturn()                             = ? (mask now [])

In this trace, no accident occurs. But under the right conditions,
either L or l would stop the process with SIGTTOU; then (after the
first fg) the kill() would be reached and stop the process a second
time.  

It turns out that this isn't much of a problem for bash and ksh, but
it is for ash (see below).

Problem III - wrong tty state after resuming
--------------------------------------------

Job gets gets suspended, after resuming with fg the tty state is wrong
for the application.

With bash and lynx, I have found that onlcr became set in lynx when it
shouldn't be.  With ksh and lynx, the wrong tty settings are more
obvious: either icanon, echo, onlcr all become erroneously set (if
neither of ksh's editing modes was in effect; lynx becomes quite
unusable); or isig becomes erroneously unset (ksh with -o emacs;
further suspension with ^Z impossible).  Does not occur when shell is
ash, or when application is less.  Happens with a lynx modified to
always ignore SIGTTOU, as well as the unmodified one (after
^Z-fg-^Z-fg).

To understand this, first note that some shells (always talking
interactive mode here) constantly juggle the tty state, each time they
go from waiting-for-input to running-a-command and vice versa.
Typically (but not always) when a command is run, flags are set for
line-oriented input/output (icanon, echo, opost, onlcr, ...)  because
most simple commands expect that, but while at the prompt the shell sets
different flags (e.g. -icanon, -echo, ... for line editing).  [This has
the effect that typing "stty -a" at the shell prompt will not show the
actual flags used by the shell prompt - I use something like "stty -a
</dev/tty1" to see them.]

Looking at the trace from ncurses-lynx again, this time concentrating
only on the tc{g,s}etattr calls; let's assume lynx has been modified
to ignore/block SIGTTOU to avoid problem I:

--- SIGTSTP (Stopped) ---                       # entering handler
ioctl(1, TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
write(1, "\33[32;1H\r", 8)              = 8                # this is L
ioctl(1, TCSETSW, {B38400 opost isig icanon echo ...}) = 0 # this is l
--- SIGTSTP (Stopped) ---     # now really stops; later when continuing:
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, TCSETSW, {B38400 opost isig -icanon -echo ...}) = 0
write(1, "\33[1;32r\33[39;49m\33[0;10m\33["..., 2304) = 2304
sigreturn()                             = ? (mask now [])

The first TCSETSW (l) is there for being nice to the shell that is
expected to take control - the tty is set to a line-oriented mode; not
really needed for a tty-juggling shell.  The second TCSETSW is for
re-establishing the appropriate raw mode for the application, before
writing output to repaint the screen.

But the ncurses handler gets this wrong if the shell is a juggling one.
Apparently the first TCGETS is used to obtain the last "program" (in
curses) tty mode, this is stored away and later used by the second
TCSETSW.  But (if "Preconditions" applies) the shell may have already
changed the mode to what _it_ wants at the point of first TCGETS, so
ncurses will pick up the wrong mode and use that when continuing.

The ncurses kernel(3) man page describes:

       The  def_prog_mode  and  def_shell_mode  routines save the
       current terminal modes as the  "program"  (in  curses)  or
       "shell"   (not   in   curses)   state   for   use  by  the
       reset_prog_mode and reset_shell_mode  routines.   This  is
       done  automatically  by  initscr.   There is one such save
       area for each screen context allocated by newterm().

       The reset_prog_mode and reset_shell_mode routines  restore
       the  terminal  to "program" (in curses) or "shell" (out of
       curses) state.  These are  done  automatically  by  endwin
       and,  after  an  endwin, by doupdate, so they normally are
       not called.

       The resetty and savetty  routines  save  and  restore  the
       state  of  the  terminal modes.  savetty saves the current
       state in a buffer and resetty restores the state  to  what
       it was at the last call to savetty.

This seems to be the same kind of mechanism used in the interrupt
handler, but it isn't clear how these functions interact with what the
handler automatically does, especially whether def_prog_mode or savetty
can be called by the application to preempt the handler's attempt to
save the current mode.

Well I guess this is more than enough for now...
I'll have a look at some ncurses sources and see if there is a
simple fix.

   Klaus


reply via email to

[Prev in Thread] Current Thread [Next in Thread]