Hi -
My work on a multi-client libpager has progressed to the point where I have pseudo code for the logic and am considering how to implement the pagemap.
I'm trying to achieve three goals with the pagemap:
1. keep pagemap entries as small as possible
2. support arbitrary numbers of clients, and
3. support a single client with minimal overhead
(1) seems important since ext2fs disk pagers cover the entire partition and can grow quite large. (2) is pretty much the whole point of my project; though I have considering relaxing that requirement to set an upper limit on the number of clients. (3) seems important since it's the most common case, but maybe the speed of libpager isn't much of an issue.
I don't see any reasonable way of achieving (2) without making the pagemap entry at least as big as a pointer (4 bytes). Currently it's a short (2 bytes), so we'll have to double the size of our pagemaps.
I'm considering several schemes to achieve (3). Here's one, that handles up to four clients fairly efficiently.
First, steal a bit from the pointer by aligning the structure on a word boundary. Then, if the least significant bit is 1, interpret the pagemap entry as a bitfield rather than as a pointer to a more complicated structure.
A 32 bit field (err... make that a 31 bit field) can handle up to four clients. The new pagemap entries have an ACCESSLIST, which is an unsorted set of clients (those who currently have access), and a WAITLIST, which is a sorted list of clients that have requested access, and a few boolean flags.
We number our clients 0 through 3, and I'm thinking above moving their ports into high port number space to easily identify them. So, the first client on the first memory object would be on port 0x8000000, the second client would be on 0x80000001, etc. The second memory object would have its ports on 0x80000004 through 0x80000007. So we can just strip off the lowest 2 bits to figure out our "client number". I don't think protected payloads would help, since we're not receiving messages on these ports; they're send ports used to communicate with the kernel(s).
ACCESSLIST entries requires two bits - one to indicate if access is granted, and one to indicate if it's read-only or read-write access. So, we use 2*4 = 8 bits for the ACCESSLIST. The WAITLIST is sorted (client requests are processed FIFO) and requires four bits per slot - one to indicate that the slot is in use, two to identify the client, one to indicate if the request is for read or write access. We need four slots for four clients, so 4*4 = 16 bits for the WAITLIST. Then we need a few more flag bits. It all fits.
Once we hit our fifth client, or some oddball condition (like an error return other than the expected EIO, ENOSPC, EDQUOT), we shift to a pointer that points to a more complicated, dynamically allocated structure.
The dynamically allocated structures are themselves stored in an rb-tree, so that if multiple pages are in the same condition (same set of clients with access, for example), they all point to the same structure. Every time we want to change a pagemap structure, we construct the new pagemap structure in a temporary object, then search the rb-tree for a matching entry, and either use a pointer to the existing entry (if found), or insert the temporary object, use a pointer to it, and construct a new temporary the next time we need to do this.
That can be a bit slow, so I'm thinking about including some extra pointers in that dynamic structure to cache the most common cases (like moving the first client on WAITLIST to ACCESSLIST). Of course, these structures would only be used if we've got more than four clients, so we're already into a corner case anyway. Right now, I'm leaning towards not including any extra pointers right now. I have no experience with large numbers of clients, so I'm not really sure what should be optimized, anyway.
It's a pretty complex scheme, and thus prone to bugs, which is why I'm posting to the list for comments. Does anybody think it would be better to ditch the bit field trick completely and handle even single clients using pointers to malloc'ed structures? The code would be simpler but slower.
agape
brent