Hi -
I'm making good progress on the multi-client libpager. I've been running it on my root filesystem for about a month now, with few problems recently.
However, there are still some bugs. One seems to be in libports. It manifests like this:
/hurd/ext2fs.static: ../../libports/../libshouldbeinlibc/refcount.h:171: refcounts_ref: Assertion '! (r.hard == 1 && r.weak == 0) || !"refcount detected use-after-free!"' failed.
/hurd/ext2fs.static: ../../libports/complete-deallocate.c:41: _ports_complete_deallocate: Assertion '! "reacquired reference w/o send rights"' failed.
gdb indicates that the port in question was generated by libfshelp/get-identity.c. That file's a short read; basically, we're storing ports in a inode-to-port hash, looking them up when io_identity() gets called, and removing them from the hash when the class's clean routine gets called.
I think what's happening is that we have a port that loses its last send right, and after its refcount is decremented but before its clean routine gets called, another call to io_identity() pulls it out of the hash. Then you've got ports_get_right complaining (that's the first line) that it's incrementing a zero refcount, and ports_port_deref complaining (that's the second line) that it deallocating a port that now has send rights.
Looking at the tail end of libports/no-senders.c, you'll see that ports_port_deref gets called after we've dropped the mutex on _ports_lock. I'm thinking that we need to hold that mutex all the way until the class's clean routine has returned in order to assure that the refcount get decremented and the port gets removed from the hash atomically.
Of course, that requires holding a global lock while the clean routine runs. It seems to me that only the port in question needs to be locked, but the individual ports don't seem to have mutexs associated with them.
Any ideas what to do?
agape
brent