gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] What functionality is expected from persistent NFS-c


From: Niels de Vos
Subject: Re: [Gluster-devel] What functionality is expected from persistent NFS-client tracking?
Date: Wed, 6 Feb 2013 18:19:56 +0100
User-agent: Mutt/1.5.20 (2009-12-10)

On Thu, Jan 31, 2013 at 03:19:28PM -0500, J. Bruce Fields wrote:
> On Thu, Jan 31, 2013 at 10:20:27AM +0100, Niels de Vos wrote:
> > On Wed, Jan 30, 2013 at 03:09:38PM -0500, J. Bruce Fields wrote:
> > > On Wed, Jan 30, 2013 at 02:31:09PM -0500, bfields wrote:
> > > > On Fri, Jan 25, 2013 at 03:23:28PM +0100, Niels de Vos wrote:
> > > > > Hi all,
> > > > > 
> > > > > the last few days I have been looking into making the tracking of 
> > > > > NFS-clients more persistent. As it is today, the NFS-clients are kept 
> > > > > in 
> > > > > a list in memory on the NFS-server. When the NFS-server restarts, the 
> > > > > list is recreated from scratch and does not contain the NFS-clients 
> > > > > that 
> > > > > still have the export mounted (Bug 904065).
> > > > > 
> > > > > NFSv3 depends on the MOUNT protocol. When an NFS-client mounts an 
> > > > > export, the MOUNT protocol is used to get the initial file-handle. 
> > > > > With 
> > > > > this handle, the NFS-service can be contacted. The actual services 
> > > > > providing the MOUNT and NFSv3 protocol can be separate (Linux kernel 
> > > > > NFSd) or implemented closely together (Gluster NFS-server). 
> > > > > 
> > > > > Now, when the Linux kernel NFS-server is used, the NFS-clients are 
> > > > > saved 
> > > > > my the rpc.mountd process (which handles the MOUNT protocol) in 
> > > > > /var/lib/nfs/rwtab. This file is modified on mounting and unmounting. 
> > > > >  
> > > > > Implementing a persistent cache similar to this is pretty straight 
> > > > > forward and is available for testing and review in [1].
> > > > > 
> > > > > There are however some use-cases that may require some different 
> > > > > handling. When an NFS-server starts to mount an export, the MOUNT 
> > > > > protocol is handled on a specific server. After getting the initial 
> > > > > file-handle for the export, any Gluster NFS-server can be used to 
> > > > > talk 
> > > > > NFSv3 and do I/O. When the NFS-clients are kept only on the 
> > > > > NFS-server 
> > > > > that handled the initial MOUNT request, and due to fail-over (think 
> > > > > CTDB 
> > > > > and similar here) an other NFS-server is used, the persistent cache 
> > > > > of 
> > > > > 'connected' NFS-clients is inaccurate.
> > > > > 
> > > > > The easiest way I can think of to remedy this issue, is to place the 
> > > > > persistent NFS-client cache on a GlusterFS volume. When CTDB is used, 
> > > > > the locking-file and is placed on a shared storage as well, so the 
> > > > > same 
> > > 
> > > This is the statd data?  That's the more important thing to get right.
> > 
> > Uhm, no. The locking-file I meant is for CTDB itself (I think). From my 
> > understanding the statd/NFS-locking is done through the GlusterFS-client 
> > (the NFS-server is a client, just like a FUSE-mount). For all I know the 
> > statd/NFS-locking is working as it should.
> 
> Oh, OK.  Looking at the code in xlators/nfs/server/src/nlm4.c....  Looks
> like it's probably just using the same statd as the kernel server--the
> one installed as a part of nfs-utils, which by default puts its state in
> /var/lib/nfs/statd/.
> 
> So if you want failover to work, then the contents of
> /var/lib/nfs/statd/ has to be made available to the server that takes
> over somehow.

This statd data and the implementation of the NLM protocol is not 
something I am very familiar with. But Rajesh (on CC) explained a little 
about it and informed me that the current NLM implementation indeed does 
not support transparent fail-over yet.

> Anyway, agreed that putting that (and the nfs client list) on some
> shared storage makes the most sense.

This is surely one of the changes that need to be made before GlusterFS 
has full support for High-Availability NFS.

> > > > > volume can be used for the NFS-client cache. Providing an option 
> > > > > to set the volume/path of the NFS-client cache would be needed for 
> > > > > this.  I guess that this could result in a chicken-and-egg 
> > > > > situation (NFS-server is started, but no volume mounted yet)?
> > > 
> > > I don't think there should be any problem here: the exported filesystems
> > > need to be available before the server starts anyway.  (Otherwise the
> > > only response the server could give to operations on filehandles would
> > > be ESTALE.)
> > 
> > Well, the NFS-server dynamically gets exports (GlusterFS volumes) added 
> > when these are started or newly created. There is no hard requirement 
> > that a specific volume is available for the NFS-server to place a shared 
> > files with a list of NFS-clients.
> 
> I'm not sure what you mean by "there is not hard requirement ...".
> 
> Surely it's a requirement that an NFS server have available at startup,
> at a minimum:
> 
>       - all exported volumes
>       - whichever volume contains /var/lib/nfs/statd/, if that's on
>         glusterfs.
> 
> otherwise reboot recovery won't work.  (And failover definitely won't
> work.)

Well, with the current state of things, the GlusterFS NFS-server (gNFS) 
does not enforce that there are any volumes available to export. These 
can be added dynamically (similar to calling exportfs for Linux nfsd).  
When an NFS-client tries to mount an export immediately after gNFS has 
been started, the MOUNT will return ENOENT :-/

There is also no real option to have /var/lib/nfs/statd/ on shared 
(GlusterFS) storage yet. If /var/lib/nfs/statd/ would be a mount-point, 
this is not mounted immediately when gNFS is started. Mostly 
a network-filesystem service (like netfs) does the mounting after the 
GlusterFS daemons are running and provide the volumes.

So, all in all, full high-availability with correct locking is something 
that can be done in the future, but is not available today.

Cheers,
Niels

> 
> --b.
> 
> > Probably easily solved by making the 
> > path to the file configurable and only accessing it when needed (and not 
> > at startup of the NFS-server).

-- 
Niels de Vos
Sr. Software Maintenance Engineer
Support Engineering Group
Red Hat Global Support Services



reply via email to

[Prev in Thread] Current Thread [Next in Thread]