[Gluster-devel] Fwd: Re: RFC: Using anonymous fds in quick-read

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] Fwd: Re: RFC: Using anonymous fds in quick-read

From:	Anand Avati
Subject:	[Gluster-devel] Fwd: Re: RFC: Using anonymous fds in quick-read
Date:	Tue, 28 Aug 2012 22:27:38 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120605 Thunderbird/13.0

(CC'ing gluster-devel)

On 08/28/2012 09:38 PM, Raghavendra Gowdappa wrote:

Avati,

Following are the questions/thoughts related to anonymous fd framework and 
their usage in quick read. Please answer or give your feedback.

Questions related to anonymous fd framework:
============================================
* Anonymous fds can work because open in itself doesn't do any primary task 
application is interested in - like read, write etc (application does an open 
with an intent of doing something else). This brings in the question, why do we 
need open at all, can't we eliminate it altogether? If we were to eliminate 
open, aren't we moving from a neater to   a messy design - each fop has to 
check whether the work associated with open (like storing contexts etc) is done 
in every invocation?


Some corrections to the above statement. There are two parts to the
open() call

1) The effects of the call itself. Like
a) Perform permission checks and establish a 'session' (with the fd) on
the allowed permission [even if permission of the inode changes in the
future while the fd is still open]
b) Perform additional operation like file truncation when flag O_TRUNC
is specified

2) Side effects of the call, like
a) Specify the cache effects on future syscalls with O_[RD]SYNC,
O_DIRECT flags
b) Offer immunity against future calls like rename() and unlink()

These are the kind of things even Gluster (or any other FS) has to
guarantee with its open() syscall.

Anonymous fds exist because
a) Protocols like NFS3 do not support the above semantics and they are
implemented completely in the client side. But we require an fd_t
parameter in the read/write fops which also do not require some of the
above semantics (like read/write perm checks) and other semantics are
guaranteed by anonymous fds already (like immunity against rename()).
Note that immunity against unlink() is currently not existing in
anonymous fds.

b) Internal optimizations in perf xlators do not require all the above
semantics sometimes.

Whether we use anonymous FDs or not, we need to keep up all the above
semantics. There are some issues with the semantics even in today's
version of quick-read - we assume permission check has already happened
(which is usually true as FUSE performs permission checks) - but that
may not be the case always. That apart, the benefit of anonymous fds in
quick-read can be in handling of fd based fops in the window of time
between a short-cutt'ed open() and its completion from the backend. They
need not wait for the open() completion if they arrive early. Instead
they can proceed with an anonymous fd -- which can significantly reduce
code complexity. Again, this can be limited to O_RDONLY +
~O_DIRECT|O_TRUNC flag'ed open()s and thereby only be vulnerable to
unlink()s happening in that window.

* how are ops like fsync handled with anonymous fds? How are we going to 
identify the fd(s) on server on which writes are actually performed? The 
problem is more acute if we happen to load write-behind on server side.


With the changes in http://review.gluster.com/712, an fsync() fop will
be a barrier against all previous writes on the inode (no matter which
fd). There is no problem if you load write-behind on the server side.
fsync() is essentially an inode operation and must not discriminate
writes based on the fd of origin.

* Though we are trying to decouple path from adressing an inode in glusterfs 
using nameless lookups, that decoupling is not complete. There are translators 
which use naming patterns to assign priorities to file (like io-cache, 
quick-read for the purposes of deciding whether to flush a cache or not). To be 
honest, the problem is seen only in fd-migration where we are using nameless 
lookups - for fresh lookups - in new graph, after a graph switch. Currently I 
am using nameless lookups with loc.path set, which solves the problem. Ideally 
nameless lookups are not the ones  to be used during migration, since they are 
not meant to be used for fresh lookups (atleast till we get rid of dependencies 
on path based addressability internally in glusterfs). However, they have huge 
performance beniefits.


Not sure what the above point is w.r.t anonymous fds, but yes - nameless
lookup takes away the sense of hierarchy (and "filename") and operations
which depend on filename or hierarchy might not always work. But then
this has been true even before we brought in nameless lookups as FUSE
issues open() on an inode and therefore we are not guaranteed to perform
open() on the right path when you have hardlinks.

Using anonymous fd framework in quick-read:
===========================================
* as far as quick read goes, its task becomes very simple. Just convert the fd 
to anonymous during open and return. It can eliminate all the dependencies of 
fops having to wait till open is actually done. In fact the fops it has to 
implement are: lookup, open and readv.


Look at my previous comments, it must perform a little more checks.
quick-read cannot just "convert" an fd to anonymous fd. Anonymous fd has
fd->pid == -1 (which a quick-unwound open() fd will not). There are also
other semantics which need to be met (at least with best effort) while
the actual fd is still unopened.


* Anonymous fd awareness should be brought in afr. it shouldn't try to open the 
files in fops like writev if fd happens to be anonymous.


I think that already is the case. Also, why do you specifically mention afr?

Thanks,
Avati

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] Fwd: Re: RFC: Using anonymous fds in quick-read, Anand Avati <=

Prev by Date: Re: [Gluster-devel] [Gluster-users] FeedBack Requested : Changes to CLI output of 'peer status'
Next by Date: Re: [Gluster-devel] [Gluster-users] FeedBack Requested : Changes to CLI output of 'peer status'
Previous by thread: Re: [Gluster-devel] [Gluster-users] FeedBack Requested : Changes to CLI output of 'peer status'
Next by thread: [Gluster-devel] glusterfs-3.3.1qa2 released
Index(es):
- Date
- Thread