Hi -
I've learned some interesting things about libpager that I'd like to share with the list.
I've found two bugs in the existing code that were "fixed" by the new demuxer that sequentially services requests on each pager object.
For example, the data return code sets a flag PAGINGOUT in the pagemap before it starts calling pager_write_page (which could be slow; writing to a hard drive, say). Future data returns check the PAGINGOUT flag and wait on a condition variable if it's set. The problem is that if multiple threads start waiting on that, pthreads doesn't guarantee what order they will run in when the conditional variable is signaled, so the data writes can get reordered. If three data returns come in 1, 2, 3, (maybe because pager_sync is called three times), number 1 starts writing, but if it doesn't finish quick enough, 2 and 3 can get reordered.
Except that they can't. The new demux code queues the second and third writes. They don't process until the first one is done. The pager object is essentially locked until the pager_write_page() completes.
I went so far as to write a test case to exercise the bug! Just good coding practice - develop tests for your known bugs first. Then I ran it, and it couldn't reproduce the bug! Only after thinking about the code more did I understand why.
I know the demuxer code was rewritten to avoid thread storms, but it's obviously got some issues and could become a performance bottleneck at some point. There's no good reason to block all access to page 100 while a disk operation completes on page 1. I'm not looking to re-write it right now, but I'm curious. Does anybody remember what characterized the thread storms? What conditions triggered them? What kind of pager operations were being done?
agape
brent