I got this mailid from http://sources.redhat.com/glibc.
Pls ignore this mail
if this is not a coreect mailid(mailing
list) to discuss this problem.
I am facing a problem in a multithread
program where I send multiple pthread_cancels
to a same thread from another thread.
In the first thread Iam catching this
pthread_cancel signal and returning
to normal operation as if cancel signal has not come.
This application works fine on Aix
and Solaris but on Linux(redhat 9 Kernel version 2.4.20-8)
it is not working.
I would be greatful for your help and
inputs in this regard.
I could recreate this problem with native
pthread calls and setjmp/longjmp calls. The
following mail gives complete information
about the problem.
Here is the testcase to recreate
the problem:
Here is the o/p on AIx and Solaris:
------------------------------------------------------------------------
T1: main: creating listener_thread
& cancel_threads
T2: lthread_loop:
Still getting
T3: cancel_lthread: Sleep 10 secs
T3: cancel_lthread: Cancelling lthread 1084972224
T3: cancel_lthread: Sleep 10 secs
T2: lthread:
Caught cancel signal
T2: lthread_loop:
Still getting
T3: cancel_lthread: Cancelling lthread 1084972224
T3: cancel_lthread: Sleep 10 secs
T2: lthread:
Caught cancel signal
T2: lthread_loop:
Still getting
T3: cancel_lthread: Cancelling lthread 1084972224
T3: cancel_lthread: Sleep 10 secs
T2: lthread:
Caught cancel signal
T2: lthread_loop:
Still getting
------------------------------------------------------------------------
Pls see that "T2: lthread: Caught
cancel signal" message after every
"T3: cancel_lthread: Cancelling
lthread 1084972224"
Means we are catching cancel signal
all the time.
But On linux, we get the following output
------------------------------------------------------------------------
T1: main: creating listener_thread &
cancel_threads
T2: lthread_loop:
Still getting
T3: cancel_lthread: Sleep 10 secs
T3: cancel_lthread: Cancelling lthread 1084972224
T3: cancel_lthread: Sleep 10 secs
T2: lthread:
Caught cancel signal
T2: lthread_loop:
Still getting
T3: cancel_lthread: Cancelling lthread 1084972224
T3: cancel_lthread: Sleep 10 secs
T3: cancel_lthread: Cancelling lthread 1084972224
T3: cancel_lthread: Sleep 10 secs
------------------------------------------------------------------------
In the above output we could see that
only first time we caught cancel signal
and not after that(Unlike Aix and Solaris,
where we caught cancel signal every time).
Initially on AIX the testcase did not
work and was failing to catch second cancel
signal and other subsequent cancel signals
too(Similar to Linux). So I have added
pthread_clear_ext_np() call in the else
block, immediately after catching the first
cancel signal, so that all the flags
which are changed during the cancel signal will
be reset to default values so
that it can catch next cancel signals. Here is the man
page info about this pthread_clear_exit_np()
API:
--------------------------------------------------------------------------------
The pthread_clear_exit_np() function
clears the exit status of the thread.
If the thread is currently exiting due
to a call to pthread_exit(), or
the target of a pthread_cancel(), pthread_clear_exit_np()
can be used in
conjunction with setjmp(), longjmp(),
andpthread_setcancelstate() to prevent
a thread from terminating, and `handle'
the exit condition.
This function is not portable
--------------------------------------------------------------------------------
On SOLARIS also program did not work
initially, so I had to add the following
calls explicitly in the else block:
On Linux I tried adding above API calls
but with no success.
Is there any alternative(or any APIs
available) here to make above testcase
to run with expected o/p similar to
AIX/SOLARIS. Can you pls let me know where
Iam going wrong here.