14:56 < Supermathie> I've stumbled across an odd problem with GlusterFS. My setup: 2 x RHEL server with a bunch of SSDs (each is a brick) replicating data between them. Single volume (gv0) exported via gluster's internal NFS.
14:57 < Supermathie> It's mounted on 4 identical servers, into which I regularly clusterssh. If I mount gv0 and do and ls -al inside a directory from all 4 clients at the same time, I get inconsistent results.
14:58 < Supermathie> Some clients see all the files, some are missing a file or two, some have duplicates (!) in the listing.
14:58 < Supermathie> stat() finds the missing files, but ls still doesn't see them.
14:58 < Supermathie> If I remount and list the directory in series instead of in parallel, everything looks good.
15:03 < samppah> Supermathie: what glusterfs version you are using and are you using locking with nfs or do you mount with nolock?
15:06 < Supermathie> glusterfs-3.3.1-1.el6.x86_64 from EPEL, mount options are: rw,sync,tcp,wsize=8192,rsize=8192 (I added sync after I noticed the weird behaviour)
15:08 < samppah> ok.. all nodes are mounting remote directory and not using localhost to mount?
15:08 < samppah> @latest
15:08 <@glusterbot> samppah: The latest version is available at http://goo.gl/zO0Fa . There is a .repo file for yum or see @ppa for ubuntu.
15:08 < Supermathie> lockd is running on clients, I presume gluster has an internal lockd
15:09 < samppah> hmm
15:09 < samppah> @yum repo
15:09 <@glusterbot> samppah: kkeithley's fedorapeople.org yum repository has 32- and 64-bit glusterfs 3.3 packages for RHEL/Fedora/Centos distributions: http://goo.gl/EyoCw
15:09 < Supermathie> samppah: you mean mounting local glusterfs as a localhost nfs client? None of the 4 NFS clients in this setup are participating in the gluster (in the gluster cluster? :) )
15:10 < samppah> Supermathie: yes and ok, that's good :)
15:11 < samppah> i'm not very familiar with gluster nfs solution nor issues it may have
15:11 < samppah> however there are newer packages available at ,,(yum repo)
15:11 <@glusterbot> kkeithley's fedorapeople.org yum repository has 32- and 64-bit glusterfs 3.3 packages for RHEL/Fedora/Centos distributions: http://goo.gl/EyoCw
15:12 < samppah> brb
15:13 < Supermathie> samppah: Can reproduce it at-will: http://imgur.com/JU8AFrt
15:13 <@glusterbot> Title: Odd GlusterFS problem - Imgur (at imgur.com)
15:16 < Chiku|dc> Supermathie, when you do ls -al | md5sum, you do md5sum on the ls text output?
15:16 < Supermathie> Yeah
15:16 < Supermathie> Just gives me a quick way of noting which hosts are different.
15:16 < Supermathie> For instance, saveservice.properties is missing from fleming1
15:16 -!- zykure is now known as zyk|off
15:16 < Supermathie> address@hidden bin]$ ls -al saveservice.properties
15:16 < Supermathie> -rw-r--r-- 1 michael users 22186 Jan 24 06:21 saveservice.properties
15:16 < Supermathie> address@hidden bin]$ ls -al | grep saveservice.properties
15:16 < Supermathie> (no result)
15:17 < Supermathie> and mirror-server.sh missing from directory listing on fleming4
15:17 < Supermathie> (and httpclient.parameters)
15:18 -!- zyk|off is now known as zykure
15:20 < Chiku|dc> Supermathie, 4 servers with replica 4 ?
15:20 < Chiku|dc> or replica 2 ?
15:20 < Supermathie> These four servers are NFS clients only. NFS server is two servers with replica 2
15:21 < Chiku|dc> oh ok
15:21 < Supermathie> If I unmount (clear cache) and remount, and ls -al again, I get different results (different files missing on different servers).
15:21 < Supermathie> If I ls -al one at a time on each client, everything's OK.
15:21 < Chiku|dc> what about gluster client ?
15:22 < Chiku|dc> mount glusterfs client
15:27 < Supermathie> Chiku|dc: clients are RHEL5… is glusterfs-client.x86_64 0:2.0.9-2.el5 going to be happy with a 3.3.1 server? probably not… :)
15:27 < JoeJulian> Not quite... :D
15:28 < JoeJulian> Supermathie: Lol!!!!!
15:28 < JoeJulian> Supermathie: no.
15:28 < Supermathie> grabbing the 3.3.1 from kkeithle's repo :)
15:33 < Supermathie> Chiku|dc: mounting as glusterfs client yields consistent correct results
15:34 < JoeJulian> What filesystem are your bricks?
15:34 < Supermathie> ext4
15:35 < JoeJulian> bingo
15:35 < Supermathie> Oh wait, right, *that* problem… ext4 with dir_index turned off
15:35 < JoeJulian> I suspect the same "cookie" problem that's been the focus around the ,,(ext4) problem is what you're seeing with nfs.
15:35 <@glusterbot> Read about the ext4 problem at http://goo.gl/PEBQU
15:36 < JoeJulian> Something about the cookie being inconsistent between calls.
15:36 < stickyboy> I even knew about the ext4 problem, and I was still bit by it deploying GlusterFS last month. :D
15:37 < Supermathie> I encountered the dir_index problem right off the bat (NOTHING worked) and it was fine after turning off dir_index on each brick filesystem
15:37 < JoeJulian> And you thought I was going to just blindly point fingers... Granted, I don't fully understand the problem, but turning off the dir_index was a workaround to prevent the endless loop. I don't /think/ it solves the inconsistent cookie thing.
15:38 < Supermathie> First I heard about an inconsistent cookie… reading 
15:39 < JoeJulian> Check the gluster-devel mailing list. Look for the threads with Bernd and Theodore.
15:40 < jdarcy> ... and me, and Avati, and Zach, and Eric, and ... ;)
15:41 < jdarcy> Long thread.
15:41 < Supermathie> Running the d_off test against the brick dir returns:
15:43 < Supermathie> http://pastebin.com/zQt5gZnZ
15:42 < JoeJulian> jdarcy: Do I have the essence of the problem correct, though? Inconsistent directory listings via nfs mount from an ext4 filesystem?
15:43 < Supermathie> (all 32-bit values)
15:45 < Supermathie> JoeJulian: The odd thing about it is that it *is* consistent… the bug tickle seems to be doing it from 4 clients in parallel. Makes me think something about the request processing at the server is getting confused.
15:46 < JoeJulian> That's why I leaned toward that being the problem.
15:46 < JoeJulian> I could be wrong though.
15:46 < JoeJulian> Try a similar test with xfs and see if it's close.
15:47 < Supermathie> https://bugzilla.redhat.com/show_bug.cgi?id=838784#c14
15:47 <@glusterbot> <http://goo.gl/wBHbB> (at bugzilla.redhat.com)
15:47 <@glusterbot> Bug 838784: high, high, ---, sgowda, POST , DHT: readdirp goes into a infinite loop with ext4
15:48 < Supermathie> I will - I'm in the midst of testing out Oracle doing DNFS to Gluster. Going to be trying out a few different configs and brick filesystems.
15:49 < jdarcy> Inconsistent directory listings seems like might be the hash-collision problem that the ext4 "fix" was trying to address.
15:57 < Supermathie> Whoah… this time, . and .. are missing from one server, and another has them listed twice. And the latter also has 'examples' twice.