gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] [RFC] A new caching/synchronization mechanism to spe


From: Xavier Hernandez
Subject: Re: [Gluster-devel] [RFC] A new caching/synchronization mechanism to speed up gluster
Date: Thu, 06 Feb 2014 09:54:03 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0


El 06/02/14 08:51, Vijay Bellur ha escrit:
On 02/06/2014 05:46 AM, Ira Cooper wrote:
Yep... this is an area I am very interested in, going forwards.

Especially sending messages back, we'll need that for any
caching/leasing/oplock/whatever we call it type protocols.

+1. I am very interested in seeing this implemented.

Xavi: Would you be able to attend next week's IRC meeting so that we can discuss this further? Of course, we can have out of band conversations but the IRC meeting might be a common ground for all interested folks to get together.
Of course, I'll be there.

Xavi


-Vijay


Keep me in the loop, and I'll keep tracking the list.  (I'm already on
the list.)

I'm also ira on freenode if you want to find me.

Thanks,

-Ira / ira@(samba.org <http://samba.org>|redhat.com <http://redhat.com>)


On Wed, Feb 5, 2014 at 6:24 PM, Anand Avati <address@hidden
<mailto:address@hidden>> wrote:

    Xavi,
    Getting such a caching mechanism has several aspects. First of all
    we need the framework pieces implemented (particularly server
    originated messages to the client for invalidation and revokes) in a
    well designed way. Particularly how we address a specific translator
    in a message originating from the server. Some of the recent changes
    to client_t allows for server-side translators to get a handle (the
client_t object) on which messages can be submitted back to the client.

    Such a framework (of having server originated messages) is also
    necessary for implementing oplocks (and possibly leases) -
    particularly interesting for the Samba integration.

    As Jeff already mentioned, this is an area where gluster has not
    focussed on, given the targeted use case. However the benefits of
    extending this to internal use cases (to avoid per-operation
    inodelks can benefit many modules - encryption/crypt, afr, etc.) It
    seems possible to have a common framework for delegating locks to
    clients, and build caching coherency protocols / oplocks / inodelk
    avoidence on top of it.

    Feel free to share a more detailed proposal if you have have/plan -
    I'm sure the Samba folks (Ira copied) would be interested too.

    Thanks!
    Avati


    On Wed, Feb 5, 2014 at 11:27 AM, Xavier Hernandez
    <address@hidden <mailto:address@hidden>> wrote:

        On 04.02.2014 17:18, Jeff Darcy wrote:

                The only synchronization point needed is to make sure
                that all bricks
                agree on the inode state and which client owns it. This
                can be achieved
                without locking using a method similar to what I
                implemented in the DFC
                translator. Besides the lock-less architecture, the main
                advantage is
                that much more aggressive caching strategies can be
                implemented very
                near to the final user, increasing considerably the
                throughput of the
                file system. Special care has to be taken with things
                than can fail on
                background writes (basically brick space and user access
                rights). Those
                should be handled appropiately on the client side to
                guarantee future
                success of writes. Of course this is only a high level
                overview. A
                deeper analysis should be done to see what to do on each
                special case.
                What do you think ?


            I think this is a great idea for where we can go - and need
            to go - in the
            long term. However, it's important to recognize that it *is*
            the long
            term. We had to solve almost exactly the same problems in
            MPFS long ago.
            Whether the synchronization uses locks or not *locally* is
            meaningless,
            because all of the difficult problems have to do with
            recovering the
            *distributed* state. What happens when a brick fails while
            holding an
            inode in any state but I? How do we recognize it, what do we
            do about it,
            how do we handle the case where it comes back and needs to
            re-acquire its
            previous state? How do we make sure that a brick can
            successfully flush
            everything it needs to before it yields a
            lock/lease/whatever? That's
            going to require some kind of flow control, which is itself
            a pretty big
            project. It's not impossible, but it took multiple people
            some years for
            MPFS, and ditto for every other project (e.g. Ceph or
            XtreemFS) which
            adopted similar approaches. GlusterFS's historical avoidance
            of this
            complexity certainly has some drawbacks, but it has also
            been key to us
            making far more progress in other areas.

        Well, it's true that there will be a lot of tricky cases that
        will need
        to be handled to be sure that data integrity and system
        responsiveness is
        guaranteed, however I think that they are not more difficult
        than what
        can happen currently if a client dies or loses communication
        while it
        holds a lock on a file.

        Anyway I think there is a great potential with this mechanism
        because it
        can allow the implementation of powefull caches, even based on
        SSD that
        could improve the performance a lot.

Of course there is a lot of work solving all potential failures and designing the right thing. An important consideration is that all
        these methods try to solve a problem that is seldom found (i.e.
        having
more than one client modifying the same file at the same time). So a
        solution that has almost 0 overhead for the normal case and
        allows the
        implementation of aggressive caching mechanisms seems a big win.


            To move forward on this, I think we need a *much* more
            detailed idea of
            how we're going to handle the nasty cases. Would some sort
            of online
            collaboration - e.g. Hangouts - make more sense than
            continuing via
            email?

        Of course, we can talk on irc or another place if you prefer

        Xavi


        _________________________________________________
        Gluster-devel mailing list
        address@hidden <mailto:address@hidden>
https://lists.nongnu.org/__mailman/listinfo/gluster-devel
<https://lists.nongnu.org/mailman/listinfo/gluster-devel>





_______________________________________________
Gluster-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/gluster-devel






reply via email to

[Prev in Thread] Current Thread [Next in Thread]