[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Help-smalltalk] [PATCH] Semaphore>>#critical: race condition
From: |
Paolo Bonzini |
Subject: |
[Help-smalltalk] [PATCH] Semaphore>>#critical: race condition |
Date: |
Mon, 01 Oct 2007 11:41:01 +0200 |
User-agent: |
Thunderbird 2.0.0.6 (Macintosh/20070728) |
From a message on the Squeak mailing list, by Andreas Raab:
Some of you (mostly those who run heavy servers) may have noticed that
at times Squeak locks up in mysterious and unforeseen ways. One of those
lockups involves Delay's AccessProtect in an unsignaled state and
consequently the entire image locking up since Delay access is required
in many, many places.
Today, David presented me an image that was locked up in such a state
but by sheer luck he managed to save it right before it happened which
allowed me to investigate the situation. The result can best be
explained by the little test case shown here:
"Create mutex unsignaled so we can manually signal it"
mutex := Semaphore new.
"Create a process which will wait inside the mutex"
p := [mutex critical:[]] forkAt: Processor userBackgroundPriority.
"Wait until process has entered mutex"
[p suspendingList == mutex]
whileFalse:[(Delay forMilliseconds: 10) wait].
"Signal mutex"
mutex signal.
"Kill process"
p terminate.
"and check to see if the mutex is signaled"
mutex isSignaled ifFalse:[self error: 'Mutex not signaled'].
Note that despite the somewhat complex setup the basic idea is that a
low priority process waiting in a critical section receives a signal on
the semaphore it is waiting on but gets terminated by a higher priority
process inbetween receiving the signal and execution of the process itself.
This situation (manually executed in the above to make it more easily
repeatable) can happen in many situations where processes get terminated
"from the outside" and it would cause particular grief in the timing
semaphore because it gets served by the highest priority process which
makes the unfortunate cause of events much more likely.
All Squeak versions that I have access to expose this behavior. Looking at
Semaphore>>critical: which says [this is the code in GNU Smalltalk -pb]
Semaphore>>critical: aBlock
self wait.
^aBlock ensure: [self signal].
makes it seem as if moving the wait into the ensured block is the
correct answer, but that ain't necessarily so. When we move the wait
into the block we risk that the entering process is terminated after
entering the block but before entering the wait which would leave the
semaphore signaled twice, which is just as bad as not signaled at all.
This patch solves, or rather tries to solve :-( this problem for GNU
Smalltalk by wrapping the "self wait" in an exception handler. There
may be a small problematic window between the end of the #wait and the
beginning of ensure
There are other problems with Delay, basically involving the Queue left
in an half-updated state when the process manipulating it is terminated.
The only way to fix this is to move all the processing of the Queue
inside the high-priority process that handles timer events. I'll apply
a patch to this end after converting the source code to the new syntax
(I have the code, but it uses the new syntax and I don't want to have
half-converted directories).
Paolo
2007-10-01 Paolo Bonzini <address@hidden>
* kernel/RecursionLock.st: Signal waiting semaphore if a process is
terminated inside its critical section.
* kernel/Semaphore.st: Likewise.
--- orig/kernel/RecursionLock.st
+++ mod/kernel/RecursionLock.st
@@ -99,7 +99,7 @@ critical: aBlock
self isOwnerProcess ifTrue: [ ^aBlock value ].
"Look out for race conditions!"
- self enter.
+ [ self enter ] on: ProcessBeingTerminated do: [ :ex | self exit. ex pass ].
^aBlock ensure: [ self exit ]
! !
--- orig/kernel/Semaphore.st
+++ mod/kernel/Semaphore.st
@@ -67,7 +67,9 @@ forMutualExclusion
critical: aBlock
"Wait for the receiver to be free, execute aBlock and signal the receiver
again. Return the result of evaluating aBlock."
- self wait.
+
+ "Look out for race conditions!"
+ [ self wait ] on: ProcessBeingTerminated do: [ :ex | self signal. ex pass
].
^aBlock ensure: [ self signal ]
! !
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Help-smalltalk] [PATCH] Semaphore>>#critical: race condition,
Paolo Bonzini <=