On Fri, May 6, 2011 at 7:11 PM, Jeff Darcy
<address@hidden> wrote:
On 05/05/2011 04:23 PM, Edward Shishkin wrote:
> The straightforward solution is to serialize read-modify-writes.
> I wonder if GlusterFS has any per-file serialization means,
> that would allow to resolve this problem. Or maybe there are
> possibilities to create such means. Any hints would be highly
> appreciated.
At a first approximation, you could just wrap the read-modify-write in
POSIX locks. That would conflict with other uses of POSIX locks, though,
and might not address the issue of "self-conflict" induced e.g. by some
of the performance translators issuing parallel writes to the same fd.
There is an "oplock" translator in CloudFS which was co-developed with
the encryption translator you're working on and which attempts to
provide the necessary conflict detection without scalability-destroying
serialization. The code does need some improvement, though, as has been
discussed on the cloudfs-devel thread you started at
https://fedorahosted.org/pipermail/cloudfs-devel/2011-May/000038.html.
In particular, we need to address not just race conditions but also e.g.
forward-progress guarantees, and (as I said in that thread) I think
judicious use of server-side request queuing is the way to do that.
You could also look at inodelk() FOP which gets serviced by the locks translator. inodelk() uses a domain based lock space (domain name could be your subvolume name) and is isolated from user application's POSIX fcntl based locks.
Avati