[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: wait -n misses signaled subprocess
From: |
Robert Elz |
Subject: |
Re: wait -n misses signaled subprocess |
Date: |
Thu, 01 Feb 2024 02:35:26 +0700 |
Date: Wed, 31 Jan 2024 11:35:57 -0500
From: Chet Ramey <chet.ramey@case.edu>
Message-ID: <1e50aa99-8d53-4cdf-ba5e-6aaf3ccc6767@case.edu>
| Not quite. `new' in this sense is the opposite of `anything in the past'
| as Dale described it -- already notified and removed from the jobs list.
I guess the part about bash that I am not understanding here is how the
"already notified" works. To me there are just two ways for that, either
the user has done a "wait" which has collected that pid already (either
without -n, and no pid args, or with pid args and one of those is the pid
in question) or with -n and the pid in question was the one whose status
was returned, or the user/script did the jobs command (or jobs -l) and the
job in question was shown as completed.
Is there some other way?
| Half the problem here is that bash aggressively marks dead jobs as being
| notified in non-interactive shells without job control enabled, and moves
| them out of the jobs table.
That might be more than half the problem, it might be the entire problem.
| If you use wait -n without arguments, you probably don't care,
No you do, that just means any of the children ... the script could make
a list of all of them and supply that list, but if the list is just going
to contain all the existing children, why bother? (With -n - and not
exactly one pid arg, -p is generally going to be required, but that option
has no bearing on which process is selected, or might be, which is the
issue here).
| but if you
| do, or if you use wait -n with pid/job arguments (which you've presumably
| saved yourself) you're going to need slightly different semantics than we
| have now to answer that reliably. And that will probably need a new option.
That's a pity, particularly since the current semantics don't seem to
be useful in general. Since the sole issue provoking that seems to be
the wait over and over policy, rather than "wait once, and remove completely"
perhaps rather than a new, but different, -n like option, a better idea would
be a "only once" option (ie: if the option (-r (remove) or -c (cleanup) or -o
(once only)) is set, then when the wait with that option returns status or,
or waits until termination without returning status (in the not -n case, with
no pid args, or many pid args) then the processes are completely deleted from
everywhere in the shell. Using that option would make a changed -n safe
to use in loops. If you do that, also add an option (maybe the upper case
version of whatever is selected for that one, or just some other letter) to
mean "don't wait" (kind of like wait(2) WNOWAIT) - which in default bash would
just be a no-op (except in posix mode, apparently - whereas the -[cor] option
would be a no-op in posix mode).
If you were to do that, other shells could add the same (except in probably
all of them, -[cor] would always be the default, and the other one would be
the one which changes behaviour).
| And that's why I used `more': there are several differences, so which
| of those differences should we attempt to change?
Just the one.
| > The one change that should be made is
| > to allow wait -n to collect processes/jobs that have already terminated.
|
| Yes, that's one of the things we're talking about. I don't have any problem
| with it, but should it take a new option to change those semantics?
Good, though I think some more thought should go into that. In another
thread you said (paraphrasing) correctly, that scripts should not be
relying upon bugs, and the current wait -n behaviour is a bug - that it
might have been intentionally coded that way doesn't make it any less so.
It isn't as if it was ever documented to work the way it does, or everyone
would have known about it already.
| > Changing it to wait for all the listed pids
| It's never done that.
| We're not going to change the return value from wait.
Good, I only mentioned those possibilities because your earlier
message was unclear about what "more like wait without -n" meant.
| Yeah, but we're talking about bash here. It doesn't really matter what
| the Bourne shell did; there are likely plenty of scripts that assume
| the historical bash behavior.
Really? Why? What's the point of collecting the status twice?
It can't change in the meantime can it, once a process has done exit(N)
its exit status should always be N, regardless of how often it is waited
upon.
[Aside: this should be obvious, but when one is collecting status changes,
rather than just "terminated" status, then the pid isn't removed if it
returns a "stopped" or "continued" status.]
| > I meant the distinction between processes
| > that the shell has already collected status for, and those for which it
| You're not the first to propose something like that, but I'm not going to
| be writing that code any time soon.
Nor am I, if you go back to the message where I first mentioned it,
which I can't locate at the minute, I am fairly sure I said that while
it might help in this case, I doubt it is worth the effort. Or something
like that.
Actually, found it eventually (this is quoting myself, earlier):
>> But as long as it is just a matter of cleaning up, and jobs works for
>> that, I don't currently see the need.
| It is, in fact, true in the current implementation, as long as the pid
| is in the jobs list.
That caveat is the problem.
| It's always been true. If there is a job marked
| (internally, if you must) as dead for which the user has not yet received
| notification, wait -n returns it and marks it as notified (and deletes
| it from the jobs list).
That part is good.
| Yes, that's one of the things we're talking about: whether wait -n should
| consider pids/jobs *not* in the jobs list, the way wait without -n does.
| That's about the only thing we're talking about changing here so far.
Maybe a better discussion, and potential change, would be to whatever
other that the use of the wait, or jobs, commands can result in a job
moving out of the jobs list. If there were nothing other than those,
(and jobs list overflow or similar) then we'd be fine, and it seems to
me now, no change to the -n operation would be needed.
| That hasn't actually been true with bash running in default mode for a
| very long time now. Bash has allowed multiple waits for the same pid for
| many years, whether or not you or I think it's a good idea or the correct
| semantics. Even if it was an accident of the implementation, and maybe you
| could say it was, we are stuck with it.
Which is why I suggested an option (just above) to turn that misfeature off.
Even better perhaps might be a bash shopt.
| It's ok, we got one.
A kind of unlikely one.
kre
Re: wait -n misses signaled subprocess, Dale R. Worley, 2024/01/24