Re: NSObject.m threading bug(s).

gnustep-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NSObject.m threading bug(s).

From:	David Chisnall
Subject:	Re: NSObject.m threading bug(s).
Date:	Sat, 11 Sep 2010 00:39:46 +0100

On 11 Sep 2010, at 00:21, Chris Ball wrote:

> Whilst I understand what you are saying, my practical experience with our code
> suggests there are subtleties perhaps within GNUstep that make this assumption
> unsafe.  

If there are, then the bugs are in other code.  Running the clang static 
analyser found a few potential user-after-release bugs in GNUstep, so it's 
possible.

> Thread 1: 
>       result = GSAtomicDecrement((gsatomic_t)&(((obj)anObject)[-1].retained));
> where result = -1

The object's last reference has just gone away.  There should now be no 
pointers to the object.  If there are, then it's an application bug.

> now switch to Thread 2:
>        result = 
> GSAtomicDecrement((gsatomic_t)&(((obj)anObject)[-1].retained));
>        if (result < 0)
>          {
>            if (result != -1)
>              {
>                [NSException raise: NSInternalInconsistencyException
>                  format: @"NSDecrementExtraRefCount() decremented too far"];
>              }
> 
> Death due to exception. 

Correct.  You have done something like this:

thread 1:
a = [objc new];
...
[a release];

thread 2:
[a release];

If the retain count of an object ever drops below 0, this means that it has 
been sent one too many release messages.  As I said, the purpose of the 
exception is to try to make debugging this easier.  

In a single-threaded application, or the same application used a single CPU or 
where the threads didn't overlap in this way, then the sequence would be:

thread 1 sends -release.
thread 1 sends handles -dealloc
thread 1 object is destroyed.
thread 2 -release message is sent to invalid address.  Program crashes.

This kind of bug is masked by using the GNUstep DESTROY() macro on a global (or 
an ivar on an object that is shared between two threads).  In this case, you 
have something roughly equivalent to this:

[a release];
a = nil;

If two threads do this with a very small gap between them, nothing will go 
wrong.  The second thread will send -release to nil, where it will be ignored.  
If, however, the preemption happens between these two lines, the second thread 
will send -release to an object with a retain count of 0.

It is possible to write a lockless, thread-safe, version of DESTROY(), but you 
can not use the GNUstep one (or something equivalent) in this way.

> Other variations of course exist, ie thread one can get through the first two
> if's and switch to the other thread just after resetting the count to zero, 
> now
> thread one will return YES as will thread 2.

Again, only possible if there is a bug outside this code.  This code is not 
intended to catch all possible bugs in user code, just the small subset that it 
is possible to catch.

> Further exercises are left to the reader.
> 
> If your contention is that only one thread is managing the retain count and
> releasing an object, then you don't need an atomic decrement at all but that
> clearly isn't true given that any thread can do retain/release an otherwise
> autoreleased object.

That is not my contention at all.

> I should point out that this isn't academic, we have a home rolled database
> that has used GNUstep since around 2002 and processes between 500 million and 
> a
> billion messages per day and before I disabled this GSATOMIC stuff we would
> process tops around 30 million messages before the system fell over in one
> weird way or another  (this on a 32 core machine).  In our application the 
> place
> I most consistently found the code to fail is where we are transferring 
> messages
> via a queue from one thread to another so we have an autorelease message
> created in thread 1, which is then retained for transfer to the queue (it is
> retained again inside the queue because the queue itself is threadsafe) then
> the other thread pulls the message from the queue (queue releases the message)
> we do stuff with it and then call release (at which point we don't care about
> the message anymore) and presumably thread one reaps the autoreleased memory.

Sounds like you have an issue where the object is already over-autoreleased.  
Try running the clang static analyser on your code.

> Now I'm not proud, this system all told, is around 300k lines of code, of 
> course
> it has bugs, having said this the program doesn't leak the queued messages (we
> don't have terrabytes of ram so we would notice) and I would expect wild 
> double
> releases in our code to have at least caused a few otherwise unexplained
> application crashes over the years given that the place that fails is called
> once per message, actually I wouldn't expect the system to last more than a
> minute or two tops given the memory churn that the system goes through.
> 
> We played with a couple simple gnustep programs using the Intel thread checker
> tool (http://software.intel.com/en-us/intel-thread-checker/) and disabling
> GSATOMIC stopped quite a few strange errors in the GNUStep internals from
> occurring for what it's worth.

Disabling GSATOMIC will put you on a very slow path, where every retain or 
release involves acquiring and releasing a lock.  It will make a lot of bugs 
disappear simply because it will enforce a lot of synchronisation on your code. 
 Every retain / release message will become a sync point.

>>> Thank goodness for the NSInteralInconsistencyException because this would 
>>> have
>>> been a bear to find otherwise.  Although very random/crashy behaviour seemed
>>> about as likely as getting the exception.
>> 
>> The exception is there for debugging.  We don't guarantee catching 
>> double-free bugs using this mechanism, because if it happened in a 
>> non-concurrent system we'd already have free'd the memory that the retain 
>> count is stored in, giving a segfault.
> 
> There is no double free bugs that I'm aware of and since the actual memory 
> release happens outside of the locks in the non-GSATOMIC version of the code
> double frees could certainly still cause problems if they are present.

There are no double-free bugs in any code that people are aware of.  They are 
trivial to fix, once you know where they are...

David

-- Sent from my brain

[Prev in Thread]

Current Thread

[Next in Thread]

NSObject.m threading bug(s)., Chris Ball, 2010/09/10
- Re: NSObject.m threading bug(s)., David Chisnall, 2010/09/10
  - Re: NSObject.m threading bug(s)., Chris Ball, 2010/09/10
    - Re: NSObject.m threading bug(s)., David Chisnall <=
- NSObject.m threading bug(s)., Chris Ball, 2010/09/11

Prev by Date: Re: NSObject.m threading bug(s).
Next by Date: Re: Static analyser results for -base (277 potential bugs)
Previous by thread: Re: NSObject.m threading bug(s).
Next by thread: NSObject.m threading bug(s).
Index(es):
- Date
- Thread