14:56 < Supermathie> I've stumbled across an odd problem with GlusterFS. My setup: 2 x RHEL server with a bunch of SSDs (each is a brick) replicating data between them. Single volume (gv0) exported via gluster's internal NFS. 14:57 < Supermathie> It's mounted on 4 identical servers, into which I regularly clusterssh. If I mount gv0 and do and ls -al inside a directory from all 4 clients at the same time, I get inconsistent results. 14:58 < Supermathie> Some clients see all the files, some are missing a file or two, some have duplicates (!) in the listing. 14:58 < Supermathie> stat() finds the missing files, but ls still doesn't see them. 14:58 < Supermathie> If I remount and list the directory in series instead of in parallel, everything looks good. 15:03 < samppah> Supermathie: what glusterfs version you are using and are you using locking with nfs or do you mount with nolock? 15:06 < Supermathie> glusterfs-3.3.1-1.el6.x86_64 from EPEL, mount options are: rw,sync,tcp,wsize=8192,rsize=8192 (I added sync after I noticed the weird behaviour) 15:08 < samppah> ok.. all nodes are mounting remote directory and not using localhost to mount? 15:08 < samppah> @latest 15:08 <@glusterbot> samppah: The latest version is available at http://goo.gl/zO0Fa . There is a .repo file for yum or see @ppa for ubuntu. 15:08 < Supermathie> lockd is running on clients, I presume gluster has an internal lockd 15:09 < samppah> hmm 15:09 < samppah> @yum repo 15:09 <@glusterbot> samppah: kkeithley's fedorapeople.org yum repository has 32- and 64-bit glusterfs 3.3 packages for RHEL/Fedora/Centos distributions: http://goo.gl/EyoCw 15:09 < Supermathie> samppah: you mean mounting local glusterfs as a localhost nfs client? None of the 4 NFS clients in this setup are participating in the gluster (in the gluster cluster? :) ) 15:10 < samppah> Supermathie: yes and ok, that's good :) 15:11 < samppah> i'm not very familiar with gluster nfs solution nor issues it may have 15:11 < samppah> however there are newer packages available at ,,(yum repo) 15:11 <@glusterbot> kkeithley's fedorapeople.org yum repository has 32- and 64-bit glusterfs 3.3 packages for RHEL/Fedora/Centos distributions: http://goo.gl/EyoCw 15:12 < samppah> brb 15:13 < Supermathie> samppah: Can reproduce it at-will: http://imgur.com/JU8AFrt 15:13 <@glusterbot> Title: Odd GlusterFS problem - Imgur (at imgur.com) 15:16 < Chiku|dc> Supermathie, when you do ls -al | md5sum, you do md5sum on the ls text output? 15:16 < Supermathie> Yeah 15:16 < Supermathie> Just gives me a quick way of noting which hosts are different. 15:16 < Supermathie> For instance, saveservice.properties is missing from fleming1 15:16 -!- zykure is now known as zyk|off 15:16 < Supermathie> address@hidden bin]$ ls -al saveservice.properties 15:16 < Supermathie> -rw-r--r-- 1 michael users 22186 Jan 24 06:21 saveservice.properties 15:16 < Supermathie> address@hidden bin]$ ls -al | grep saveservice.properties 15:16 < Supermathie> (no result) 15:17 < Supermathie> and mirror-server.sh missing from directory listing on fleming4 15:17 < Supermathie> (and httpclient.parameters) 15:18 -!- zyk|off is now known as zykure 15:20 < Chiku|dc> Supermathie, 4 servers with replica 4 ? 15:20 < Chiku|dc> or replica 2 ? 15:20 < Supermathie> These four servers are NFS clients only. NFS server is two servers with replica 2 15:21 < Chiku|dc> oh ok 15:21 < Supermathie> If I unmount (clear cache) and remount, and ls -al again, I get different results (different files missing on different servers). 15:21 < Supermathie> If I ls -al one at a time on each client, everything's OK. 15:21 < Chiku|dc> what about gluster client ? 15:22 < Chiku|dc> mount glusterfs client 15:27 < Supermathie> Chiku|dc: clients are RHEL5… is glusterfs-client.x86_64 0:2.0.9-2.el5 going to be happy with a 3.3.1 server? probably not… :) 15:27 < JoeJulian> Not quite... :D 15:28 < JoeJulian> Supermathie: Lol!!!!! 15:28 < JoeJulian> Supermathie: no. 15:28 < Supermathie> grabbing the 3.3.1 from kkeithle's repo :) 15:33 < Supermathie> Chiku|dc: mounting as glusterfs client yields consistent correct results 15:34 < JoeJulian> What filesystem are your bricks? 15:34 < Supermathie> ext4 15:35 < JoeJulian> bingo 15:35 < Supermathie> Oh wait, right, *that* problem… ext4 with dir_index turned off 15:35 < JoeJulian> I suspect the same "cookie" problem that's been the focus around the ,,(ext4) problem is what you're seeing with nfs. 15:35 <@glusterbot> Read about the ext4 problem at http://goo.gl/PEBQU 15:36 < JoeJulian> Something about the cookie being inconsistent between calls. 15:36 < stickyboy> I even knew about the ext4 problem, and I was still bit by it deploying GlusterFS last month. :D 15:37 < Supermathie> I encountered the dir_index problem right off the bat (NOTHING worked) and it was fine after turning off dir_index on each brick filesystem 15:37 < JoeJulian> And you thought I was going to just blindly point fingers... Granted, I don't fully understand the problem, but turning off the dir_index was a workaround to prevent the endless loop. I don't /think/ it solves the inconsistent cookie thing. 15:38 < Supermathie> First I heard about an inconsistent cookie… reading 15:39 < JoeJulian> Check the gluster-devel mailing list. Look for the threads with Bernd and Theodore. 15:40 < jdarcy> ... and me, and Avati, and Zach, and Eric, and ... ;) 15:41 < jdarcy> Long thread. 15:41 < Supermathie> Running the d_off test against the brick dir returns: 15:43 < Supermathie> http://pastebin.com/zQt5gZnZ 15:42 < JoeJulian> jdarcy: Do I have the essence of the problem correct, though? Inconsistent directory listings via nfs mount from an ext4 filesystem? 15:43 < Supermathie> (all 32-bit values) 15:45 < Supermathie> JoeJulian: The odd thing about it is that it *is* consistent… the bug tickle seems to be doing it from 4 clients in parallel. Makes me think something about the request processing at the server is getting confused. 15:46 < JoeJulian> That's why I leaned toward that being the problem. 15:46 < JoeJulian> I could be wrong though. 15:46 < JoeJulian> Try a similar test with xfs and see if it's close. 15:47 < Supermathie> https://bugzilla.redhat.com/show_bug.cgi?id=838784#c14 15:47 <@glusterbot> (at bugzilla.redhat.com) 15:47 <@glusterbot> Bug 838784: high, high, ---, sgowda, POST , DHT: readdirp goes into a infinite loop with ext4 15:48 < Supermathie> I will - I'm in the midst of testing out Oracle doing DNFS to Gluster. Going to be trying out a few different configs and brick filesystems. 15:49 < jdarcy> Inconsistent directory listings seems like might be the hash-collision problem that the ext4 "fix" was trying to address. 15:57 < Supermathie> Whoah… this time, . and .. are missing from one server, and another has them listed twice. And the latter also has 'examples' twice.