Keep me in the loop, and I'll keep tracking the list. (I'm already on
the list.)
I'm also ira on freenode if you want to find me.
Thanks,
-Ira / ira@(samba.org <http://samba.org>|redhat.com <http://redhat.com>)
On Wed, Feb 5, 2014 at 6:24 PM, Anand Avati <address@hidden
<mailto:address@hidden>> wrote:
Xavi,
Getting such a caching mechanism has several aspects. First of all
we need the framework pieces implemented (particularly server
originated messages to the client for invalidation and revokes) in a
well designed way. Particularly how we address a specific translator
in a message originating from the server. Some of the recent changes
to client_t allows for server-side translators to get a handle (the
client_t object) on which messages can be submitted back to the client.
Such a framework (of having server originated messages) is also
necessary for implementing oplocks (and possibly leases) -
particularly interesting for the Samba integration.
As Jeff already mentioned, this is an area where gluster has not
focussed on, given the targeted use case. However the benefits of
extending this to internal use cases (to avoid per-operation
inodelks can benefit many modules - encryption/crypt, afr, etc.) It
seems possible to have a common framework for delegating locks to
clients, and build caching coherency protocols / oplocks / inodelk
avoidence on top of it.
Feel free to share a more detailed proposal if you have have/plan -
I'm sure the Samba folks (Ira copied) would be interested too.
Thanks!
Avati
On Wed, Feb 5, 2014 at 11:27 AM, Xavier Hernandez
<address@hidden <mailto:address@hidden>> wrote:
On 04.02.2014 17:18, Jeff Darcy wrote:
The only synchronization point needed is to make sure
that all bricks
agree on the inode state and which client owns it. This
can be achieved
without locking using a method similar to what I
implemented in the DFC
translator. Besides the lock-less architecture, the main
advantage is
that much more aggressive caching strategies can be
implemented very
near to the final user, increasing considerably the
throughput of the
file system. Special care has to be taken with things
than can fail on
background writes (basically brick space and user access
rights). Those
should be handled appropiately on the client side to
guarantee future
success of writes. Of course this is only a high level
overview. A
deeper analysis should be done to see what to do on each
special case.
What do you think ?
I think this is a great idea for where we can go - and need
to go - in the
long term. However, it's important to recognize that it *is*
the long
term. We had to solve almost exactly the same problems in
MPFS long ago.
Whether the synchronization uses locks or not *locally* is
meaningless,
because all of the difficult problems have to do with
recovering the
*distributed* state. What happens when a brick fails while
holding an
inode in any state but I? How do we recognize it, what do we
do about it,
how do we handle the case where it comes back and needs to
re-acquire its
previous state? How do we make sure that a brick can
successfully flush
everything it needs to before it yields a
lock/lease/whatever? That's
going to require some kind of flow control, which is itself
a pretty big
project. It's not impossible, but it took multiple people
some years for
MPFS, and ditto for every other project (e.g. Ceph or
XtreemFS) which
adopted similar approaches. GlusterFS's historical avoidance
of this
complexity certainly has some drawbacks, but it has also
been key to us
making far more progress in other areas.
Well, it's true that there will be a lot of tricky cases that
will need
to be handled to be sure that data integrity and system
responsiveness is
guaranteed, however I think that they are not more difficult
than what
can happen currently if a client dies or loses communication
while it
holds a lock on a file.
Anyway I think there is a great potential with this mechanism
because it
can allow the implementation of powefull caches, even based on
SSD that
could improve the performance a lot.
Of course there is a lot of work solving all potential failures and
designing the right thing. An important consideration is that all
these methods try to solve a problem that is seldom found (i.e.
having
more than one client modifying the same file at the same time). So a
solution that has almost 0 overhead for the normal case and
allows the
implementation of aggressive caching mechanisms seems a big win.
To move forward on this, I think we need a *much* more
detailed idea of
how we're going to handle the nasty cases. Would some sort
of online
collaboration - e.g. Hangouts - make more sense than
continuing via
email?
Of course, we can talk on irc or another place if you prefer
Xavi
_________________________________________________
Gluster-devel mailing list
address@hidden <mailto:address@hidden>
https://lists.nongnu.org/__mailman/listinfo/gluster-devel
<https://lists.nongnu.org/mailman/listinfo/gluster-devel>
_______________________________________________
Gluster-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/gluster-devel