[Gluster-devel] [RFC] A new caching/synchronization mechanism to speed u

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] [RFC] A new caching/synchronization mechanism to speed u

From:	Xavier Hernandez
Subject:	[Gluster-devel] [RFC] A new caching/synchronization mechanism to speed up gluster
Date:	Tue, 04 Feb 2014 10:07:22 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0

Hi,

currently, inodelk() and entrylk() are being used to make sure thatchanges happen synchronously on all bricks, avoiding data/metadatacorruption when multiple clients modify the same inode concurrently. Sofar so good, however I think this introduces a significant overhead toavoid a situation that will happen very rarely. It also limits theadvantage of client-side caches.

I propose to implement a new translator that uses a MESI-like protocol(protocol used to maintain memory coherency between local caches of CPUcores). This translator would add virtually 0 overhead when there isn'tmore than one client accessing the same inode, and an overheadcomparable to current implementation if there is contention.

Another advantage of this protocol would be that it will be possible toimplement much more aggressive caching mechanisms on the client sidethat will improve overall performance without losing any current features.


At a high level this is how it could work:

Each client tracks the state of each inode it uses (M - Modified, E -Exclusive, S - Shared, I - Invalid). All inodes will be created in theinvalid state. When the client needs to write the inode, it asks allbricks exclusive access. Once granted, the inode will be in exclusivestate and any read/write operation could be made locally on the clientside, because it knows that nobody else will be modifying the inode. Ifthe inode is successfully written (on the local cache), the state willchange to modified. Eventually the changes will be sent to the bricks inbackground and the state will go back to exclusive, or invalid if theinode is not needed anymore.

Now, if another client needs to read or write the same inode, it willsend a request to all bricks. If the inode is in the exclusive ormodified state in one of the clients, the bricks will notify the currentowner of the inode to flush all pending changes. Once completed, the newclient will be granted exclusive (if it's a write request) or shared (ifit's a read request) access to the inode. The former owner will leavethe inode in the invalid state (if it's a write request) or shared (ifit's a read request).

Multiple clients can read a shared inode simultaneously, however if oneclient needs exclusive access to the inode, all other clients will needto set inode's state to invalid before granting exclusive access.

The only synchronization point needed is to make sure that all bricksagree on the inode state and which client owns it. This can be achievedwithout locking using a method similar to what I implemented in the DFCtranslator.

Besides the lock-less architecture, the main advantage is that much moreaggressive caching strategies can be implemented very near to the finaluser, increasing considerably the throughput of the file system. Specialcare has to be taken with things than can fail on background writes(basically brick space and user access rights). Those should be handledappropiately on the client side to guarantee future success of writes.

Of course this is only a high level overview. A deeper analysis shouldbe done to see what to do on each special case.


What do you think ?

Xavi

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] [RFC] A new caching/synchronization mechanism to speed up gluster, Xavier Hernandez <=
- Re: [Gluster-devel] [RFC] A new caching/synchronization mechanism to speed up gluster, Jeff Darcy, 2014/02/04
  - Re: [Gluster-devel] [RFC] A new caching/synchronization mechanism to speed up gluster, Xavier Hernandez, 2014/02/05
    - Re: [Gluster-devel] [RFC] A new caching/synchronization mechanism to speed up gluster, Anand Avati, 2014/02/05
    - Re: [Gluster-devel] [RFC] A new caching/synchronization mechanism to speed up gluster, Ira Cooper, 2014/02/05
    - Re: [Gluster-devel] [RFC] A new caching/synchronization mechanism to speed up gluster, Vijay Bellur, 2014/02/06
    - Re: [Gluster-devel] [RFC] A new caching/synchronization mechanism to speed up gluster, Xavier Hernandez, 2014/02/06
    - Re: [Gluster-devel] [RFC] A new caching/synchronization mechanism to speed up gluster, Xavier Hernandez, 2014/02/06
    - Re: [Gluster-devel] [RFC] A new caching/synchronization mechanism to speed up gluster, Xavier Hernandez, 2014/02/10
- Re: [Gluster-devel] [RFC] A new caching/synchronization mechanism to speed up gluster, Niels de Vos, 2014/02/10
  - Re: [Gluster-devel] [RFC] A new caching/synchronization mechanism to speed up gluster, Xavier Hernandez, 2014/02/10

Prev by Date: [Gluster-devel] Testing of new features in 3.5
Next by Date: Re: [Gluster-devel] libgfapi threads
Previous by thread: [Gluster-devel] Testing of new features in 3.5
Next by thread: Re: [Gluster-devel] [RFC] A new caching/synchronization mechanism to speed up gluster
Index(es):
- Date
- Thread