bug-glibc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug in linuxthreads: sem_wait/pthread_cond_wait interaction


From: Gene Cooperman
Subject: bug in linuxthreads: sem_wait/pthread_cond_wait interaction
Date: Thu, 17 May 2001 18:01:56 -0400 (EDT)

I believe I have found a simple bug in the semaphore implementation
of linuxthreads-2.0.6.  Specifically, it is in semaphore.c:sem_wait()_

I also checked linuxthreads-2.2.2, and the same bug appears to be present
in oldsemaphore.c:__old_sem_wait().  The bug does not affect
the new sem_wait of linuxthreads-2.2.2.  Even in the old code, the
bug appears _only_ in certain cases when the application mixes
calls to semaphores with calls to mutexes and condition variables.
If semaphore are used in isolation or if semaphores are not used,
then the bug does not appear.

I will describe the bug with reference to linuxthreads-2.0.6.
If you just want the bug fix, you'll find my proposed bug
fix at the end of this message.

I've confirmed the bug by enabling the ASSERT in linuxthreads:queue.h,
and re-compiling.  The bug in my application is then always shadowed
by a triggering of assert.  I first discovered this bug using my TOP-C/C++
system for writing parallel applications for both distributed and shared memory
(http://www.ccs.neu.edu/home/gene/topc.html).  TOP-C/C++ seems to have
good test coverage for finding some of these pthreads bugs.  :-)
                                                - Gene Cooperman
                                                  address@hidden
=====================================================================

The logic of semaphore.c indicates that sem->sem_status is used
to EITHER hold the head of a queue of threads waiting on sem,
or ELSE hold a semaphore count, stored as value ((long)count << 1) + 1.

In the event that sem->sem_status holds the head of a queue of
waiting threads, the forward links from a thread th
is given as th->p_nextwaiting.  The last thread in the waiting queue
has:   th->p_nextwaiting == (pthread_descr)0x1
It is the job of sem_wait() to set:  th->p_nextwaiting = 0x1
before suspending the thread.  It is the job of
sem_restart_list() to set: th->p_nextwaiting = NULL
before restarting the thread.

The bug occurs in the case that the semaphore count is 0,
which is indicated by:  sem->sem_status == 1
and sem_post(sem) and sem_wait(sem) are called at approximately
the same time.  The following race condition can occur.

The invocation of sem_wait(sem) causes the local variable
oldstatus to receive the value 1 (corresponding to sem->sem_status).
Assume that sem_post(sem) then executes in its entirety, causing
sem->sem_status to be incremented to the value 3.
Then sem_wait(sem) continues executing.

The result is that sem_wait(sem) will follow the "else" branch and set:
self->p_nextwaiting = (pthread_descr) oldstatus; where oldstatus is 1.
The following call to sem_compare_and_swap() will then return 0, and
one iterates again through the "do" loop.  On the second iteration,
oldstatus is set to 3 (the new value of sem->sem_status),
the "if" branch is followed, newstatus is set to oldstatus - 2 (or to 1),
and sem_compare_and_swap() returns 1.  So, sem_wait(sem) exits with a
return value of 1, as it should.  But now self->p_nextwaiting has
the value of 1 instead of NULL, while the thread is not suspended.

If that thread later calls pthread_cond_wait(cond, mutex) and is suspended,
the condition variable, cond, will have this thread, th, on its wait
queue, and the value of th->p_nextwaiting will be 1.  The code
for enqueue(), dequeue(), etc. assumes that the value of
th->p_nextwaiting is always either NULL or a valid thread descriptor,
and so the code breaks upon encountering the value 1.
=====================================================================

For a bug fix, I propose to restore the correct value of NULL
for self->p_nextwaiting before sem_wait(sem) returns.
I propose changing a code fragment in sem_wait()
FROM:
    if (newstatus & 1)
      /* We got the semaphore. */
      return 0;
TO:
    if (newstatus & 1) {
      /* We got the semaphore. */
      self->p_nextwaiting = NULL;
      return 0;
    }



reply via email to

[Prev in Thread] Current Thread [Next in Thread]