Re: [Gluster-devel] Parallel readdir from NFS clients causes incorrect d

Avati

On Wed, Apr 3, 2013 at 4:35 PM, Anand Avati <address@hidden> wrote:

Hmm, I was be tempted to suggest that you were bitten by the gluster/ext4 readdir's d_off incompatibility issue (which got recently fixed http://review.gluster.org/4711/). But you say it works fine when you do ls one at a time sequentially.

I just realized after reading your email that, in glusterfs, because we use the same anonymous fd for multiple client/application's readdir query, we have a race in the posix translator where two threads attempt to push/pull the same backend cursor in a chaotic way resulting in duplicate/lost entries. This might be the issue you are seeing, just guessing.

Will you be willing to try out a source cod patch on top of the git HEAD to rebuild your glusterfs and verify if it fixes the issue? Will really appreciate it!

Thanks,
Avati
On Wed, Apr 3, 2013 at 2:37 PM, Michael Brown <address@hidden> wrote:
I'm seeing a problem on my fairly fresh RHEL gluster install. Smells to me like a parallelism problem on the server.

If I mount a gluster volume via NFS (using glusterd's internal NFS server, nfs-kernel-server) and read a directory from multiple clients *in parallel*, I get inconsistent results across servers. Some files are missing from the directory listing, some may be present twice!

Exactly which files (or directories!) are missing/duplicated varies each time. But I can very consistently reproduce the behaviour.

You can see a screenshot here: http://imgur.com/JU8AFrt

The replication steps are:
* clusterssh to each NFS client
* unmount /gv0 (to clear cache)
* mount /gv0 [1]
* ls -al /gv0/common/apache-jmeter-2.9/bin (which is where I first noticed this)

Here's the rub: if, instead of doing the 'ls' in parallel, I do it in series, it works just fine (consistent correct results everywhere). But hitting the gluster server from multiple clients at the same time causes problems.

I can still stat() and open() the files missing from the directory listing, they just don't show up in an enumeration.

Mounting gv0 as a gluster client filesystem works just fine.

Details of my setup:
2 × gluster servers: 2×E5-2670, 128GB RAM, RHEL 6.4 64-bit, glusterfs-server-3.3.1-1.el6.x86_64 (from EPEL)
4 × NFS clients: 2×E5-2660, 128GB RAM, RHEL 5.7 64-bit, glusterfs-3.3.1-11.el5 (from kkeithley's repo, only used for testing)
gv0 volume information is below
bricks are 400GB SSDs with ext4[2]
common network is 10GbE, replication between servers happens over direct 10GbE link.

I will be testing on xfs/btrfs/zfs eventually, but for now I'm on ext4.

Also attached is my chatlog from asking about this in #gluster

[1]: fstab line is: fearless1:/gv0 /gv0 nfs defaults,sync,tcp,wsize=8192,rsize=8192 0 0
[2]: yes, I've turned off dir_index to avoid That Bug. I've run the d_off test, results are here: http://pastebin.com/zQt5gZnZ

----
gluster> volume info gv0 Volume Name: gv0Type: Distributed-ReplicateVolume ID: 20117b48-7f88-4f16-9490-a0349afacf71Status: StartedNumber of Bricks: 8 x 2 = 16Transport-type: tcpBricks:Brick1: fearless1:/export/bricks/500117310007a6d8/glusterdataBrick2: fearless2:/export/bricks/500117310007a674/glusterdataBrick3: fearless1:/export/bricks/500117310007a714/glusterdataBrick4: fearless2:/export/bricks/500117310007a684/glusterdataBrick5: fearless1:/export/bricks/500117310007a7dc/glusterdataBrick6: fearless2:/export/bricks/500117310007a694/glusterdataBrick7: fearless1:/export/bricks/500117310007a7e4/glusterdataBrick8: fearless2:/export/bricks/500117310007a720/glusterdataBrick9: fearless1:/export/bricks/500117310007a7ec/glusterdataBrick10: fearless2:/export/bricks/500117310007a74c/glusterdataBrick11: fearless1:/export/bricks/500117310007a838/glusterdataBrick12: fearless2:/export/bricks/500117310007a814/glusterdataBrick13: fearless1:/export/bricks/500117310007a850/glusterdataBrick14: fearless2:/export/bricks/500117310007a84c/glusterdataBrick15: fearless1:/export/bricks/500117310007a858/glusterdataBrick16: fearless2:/export/bricks/500117310007a8f8/glusterdataOptions Reconfigured:diagnostics.count-fop-hits: ondiagnostics.latency-measurement: onnfs.disable: off----

-- Michael Brown | `One of the main causes of the fall of Systems Consultant | the Roman Empire was that, lacking zero, Net Direct Inc. | they had no way to indicate successful ☎: +1 519 883 1172 x5106 | termination of their C programs.' - Firth
_______________________________________________
Gluster-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/gluster-devel

From:	Michael Brown
Subject:	Re: [Gluster-devel] Parallel readdir from NFS clients causes incorrect data
Date:	Thu, 04 Apr 2013 12:31:49 -0400
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130308 Thunderbird/17.0.4