Re: [Gluster-devel] Proposal to change locking in data-self-heal

From:

Anand Avati

Subject:

Date:

Tue, 21 May 2013 12:49:19 -0700

On Tue, May 21, 2013 at 7:05 AM, Jeff Darcy <address@hidden> wrote:

On 05/21/2013 09:10 AM, Pranith Kumar Karampuri wrote:

scenario-1 won't happen because there exists a chance for it to acquire
truncate's full file lock after any 128k range sync happens.

Scenario-2 won't happen because extra self-heals that are launched on the
same file will be blocked in self-heal-domain so the data-path's locks are
not affected by this.

This is two changes for two scenarios:

* Change granular-lock order to avoid scenario 1.

* Add a new lock to avoid scenario 2.

I'm totally fine with the second part, but the first part makes me a bit uneasy. As I recall, the "chained" locking (lock second range before unlocking the first) was a deliberate choice to avoid windows where somebody could jump in with a truncate during self-heal. It certainly wasn't the obvious thing to do, suggesting there was a specific reason.

Chained locking really was to avoid the start of a second self-heal while one is in progress. By giving up the big lock (after holding the small lock), regional operations can progress but the second self-heal waiting to hold the big lock is still held off. Truncating isn't really an issue as long as the truncate transaction locks from new EOF till infinity (which it is) and self-heal will not race in those regions. Of course, with chained locking, full-file truncates can starve.

It's not obvious what use case Pranith is facing where self-heal and ftruncate() are competing. Is it just an artificial/hand-crafted test case, or a real-life access pattern?

Until we recall what that reason was I don't think we can determine whether this is a bug or a feature. If it's a feature, or if we don't know, this change is likely to break something instead of fixing it.

The sole reason was to prevent the start of a second self-heal. Having self-heals serialize in a separate domain solves the problem (except if we are looking at runtime compatibility across multiple versions in this scenario.. aargh!)

Avati