Re: [Gluster-devel] Replace cluster wide gluster locks with volume wide

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Replace cluster wide gluster locks with volume wide

From:	Avra Sengupta
Subject:	Re: [Gluster-devel] Replace cluster wide gluster locks with volume wide locks
Date:	Fri, 13 Sep 2013 00:30:08 +0530
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8

Hi,

After having further discussions, we revisited the requirements and it looks 
possible to further improve them, as well
as the design.

1. We classify all gluster operations in three different classes : Create 
volume, Delete volume, and volume specific
   operations.
2. At any given point of time, we should allow two simultaneous operations 
(create, delete or volume specific), as long
   as each both the operations are not happening on the same volume.
3. If two simultaneous operations are performed on the same volume, the 
operation which manages to acquire the volume
   lock will succeed, while the other will fail.

In order to achieve this, we propose a locking engine, which will receive lock 
requests from these three types of
operations. Each such request for a particular volume will contest for the same 
volume lock (based on the volume name
and the node-uuid). For example, a delete volume command for volume1 and a 
volume status command for volume 1 will
contest for the same lock (comprising of the volume name, and the uuid of the 
node winning the lock), in which case,
one of these commands will succeed and the other one will not, failing to 
acquire the lock.

Whereas, if two operations are simultaneously performed on a different volumes 
they should happen smoothly, as both
these operations would request the locking engine for two different locks, and 
will succeed in locking them in parallel.

Regards,
Avra

On 09/12/2013 01:18 PM, Kaushal M wrote:

On Thu, Sep 12, 2013 at 12:10 PM, Varun Shastry <address@hidden> wrote:

On Thursday 12 September 2013 08:41 AM, Krishnan Parthasarathi wrote:


----- Original Message -----

Hi,

As of today most gluster commands take a cluster wide lock, before
performing
their respective operations. As a result
any two gluster commands, which have no interdependency with each other,
can't be executed simultaneously. To
remove this interdependency we propose to replace this cluster wide lock
with
a volume specific lock, so that two
operations on two different volumes can be performed simultaneously.

By implementing this volume wide lock, our agenda is to achieve the
following:
1. Allowing simultaneous volume operations on different volumes.
Performing
two operations simultaneously on the same
     volume should not be allowed.
2. While a volume is being created, or deleted, operations(like
rebalance,
geo-rep) should be permitted on other volumes.

You can't meaningfully interleave a volume-delete and a volume-rebalance
on the
same volume.
The locking domains must be arranged such that, create/delete operations
take a lock which conflict all locks held on any volume. This would
handle the scenario mentioned above.

The volume operations performed in the cluster can be classified into the
following
categories,
- Create/Delete - Need a cluster-wide lock. This locks 'trumps' all other
locks

- Add-brick, remove-brick, replace-brick, rebalance, volume-set/reset and
a whole bunch of features
    associated with a volume like quota, geo-rep etc - Need a volume-level
lock.

Since these operations (add brick, remove brick, set-reset) modify the nfs
volfile, a single volfile for all the volumes, don't we need to take the
cluster-wide lock for the above set of operations also?

We did discuss this when we were discussing the design before it was
posted to the list and we don't think it'll be a problem. As the
generation of these volfiles take all the volumes into consideration,
the last operation, among concurrent volume operations, to generate
these volfiles will generate the correct volfile. If another operation
which requires volfiles regeneration happens when these are being
generated, it won't be a problem either as the operation will
regenrate the volfiles.

- Varun Shastry

- Volume-info, volume-status - Need no locking.

3. Two simultaneous volume create or volume delete operations should not
be
permitted.

We propose to do this in two steps:

1. Implementing the volume wide lock: In order to implement this, we will
add
a lock consisting of the uuid of the
     originator, to the in-memory volinfo(of that particular volume), in
all
     the nodes of the cluster. Once this lock is
     taken, any other command for the same volume, will not be able to
acquire
     this lock from that particular volume's
     volinfo. Meanwhile other operations on other volumes can still be
     executed.

There is one caveat with storing the volume-level lock in volinfo object.
All the nodes are not guaranteed to have an up-to-date version of the
volinfo
object. We don't know have the necessary mechanism to select the peers
based
on their recency of volume information. Worse still, the volinfo object
could be
modified on incoming updates from other peers in the cluster,
if and when they rejoin (the available partition of) the cluster, in the
middle of
a transaction. I agree this lack of guarantee is part of a different
problem,
but this is a runtime reality :(

This prompts me to think that we should keep all the lock related
book-keeping
independent of things that are not guaranteed to stick around, for the
lifetime
of a command. This way we can keep the policy of locking (ie. volume-wide
Vs cluster-wide)
separate from the mechanism.

cheers,
krish

2. Stop using the cluster lock for existing commands: Port existing
commands
to use this framework. We will use
     op-version to take care of backward compatibility for the existing
     commands. We need to take care of commands like
     volume create, volume delete, rebalance callbacks, implicit volume
syncs
     (when a node comes up), the volume sync
     command which modify the priv->volumes, and also other non-volume
     operations which work inside the gambit of the
     cluster locks today while implementing this.
     Please feel free to provide feedback.

Regards,
Avra

_______________________________________________
Gluster-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/gluster-devel



_______________________________________________
Gluster-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/gluster-devel

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] Replace cluster wide gluster locks with volume wide locks, Avra Sengupta, 2013/09/11
- Re: [Gluster-devel] Replace cluster wide gluster locks with volume wide locks, Krishnan Parthasarathi, 2013/09/11
  - Re: [Gluster-devel] Replace cluster wide gluster locks with volume wide locks, Varun Shastry, 2013/09/12
    - Re: [Gluster-devel] Replace cluster wide gluster locks with volume wide locks, Kaushal M, 2013/09/12
    - Re: [Gluster-devel] Replace cluster wide gluster locks with volume wide locks, Avra Sengupta <=
    - Re: [Gluster-devel] Replace cluster wide gluster locks with volume wide locks, Vijay Bellur, 2013/09/13
    - Re: [Gluster-devel] Replace cluster wide gluster locks with volume wide locks, Avra Sengupta, 2013/09/13
    - Re: [Gluster-devel] Replace cluster wide gluster locks with volume wide locks, Kaushal M, 2013/09/13
    - Re: [Gluster-devel] Replace cluster wide gluster locks with volume wide locks, Avra Sengupta, 2013/09/16
    - Re: [Gluster-devel] Replace cluster wide gluster locks with volume wide locks, Avra Sengupta, 2013/09/24
    - Re: [Gluster-devel] Replace cluster wide gluster locks with volume wide locks, Amar Tumballi, 2013/09/27

Prev by Date: [Gluster-devel] Suggested statedump sample strings for Bug 915629 (Improve debuggability of inode/entry locks)
Next by Date: Re: [Gluster-devel] [Gluster-users] Possible memory leak ?
Previous by thread: Re: [Gluster-devel] Replace cluster wide gluster locks with volume wide locks
Next by thread: Re: [Gluster-devel] Replace cluster wide gluster locks with volume wide locks
Index(es):
- Date
- Thread