Re: [Gluster-devel] Question about afr/self-heal

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Question about afr/self-heal

From:	Kevan Benson
Subject:	Re: [Gluster-devel] Question about afr/self-heal
Date:	Tue, 09 Dec 2008 13:00:17 -0800
User-agent:	Thunderbird 2.0.0.17 (X11/20081001)

Brian Hirt wrote:

Hello,
I'm running some tests with with GlusterFS and so far I like what I'mseeing. I've got a test 4 node system set up with AFR-Unify. node1and node3 are replicated and node2 and node4 are replicated. These arethen unified together into a single brick. afr and unify are done onthe client side. All of the servers are running ubuntu + glusterfs1.3.12 with an underlaying ext3 filesystem.
Durning one of my tests i took down a server during an update/additionof a few thousand files. After the update was complete, i brought upthe downed node. I was able to see all the new files after i did adirectory listing on the client, but they all had a size of 0 and theupdated files still had the old contents. When I opened these files onthe client, the correct contents were returned and the once down nodewas then corrected for that file.
From searching through the email archives, this seems like the intendedway it supposed to work. However, in the state that the filesystem isin now, my redundancy is lost for those changes until i open every fileand directory on the client. In my configuration I intend to have manymillion files. Am I supposed to open every single one of them after anode goes down to get the replication back in sync? There will often betimes where servers are brought down for routine maintenance for 10-15minutes at a time and during that time only a few hundred files mightchange. What is the proper procedure for resynchronizing? How areother people handling this? I've seen a few comments about fsck in themail archive referencing a path that doesn't exist in my GlusterFSdistribution (possibly it's the 1.4 branch)

My understanding is that the accepted way to force synchronizationbetween servers is to open and read a small amount (a single byte willdo, healing is triggered on an actual open system call) of every file,or the files you want to ensure are synced. Something like the following:


find /mnt/glusterfs -type f -exec head -c 1 {} \; >/dev/null

If one system has been down for a known amount of time, you can use thatto your advantage:


find /mnt/glusterfs -type f -mtime -1 exec head -c 1 {} \; >/dev/null

will only read from files which have a modified time of less than 24hours from now.

You could probably speed up the find by passing multiple file names tohead at a time using xargs, to reduce program (head) initializationoverhead:


find /mnt/glusterfs -type f -mtime -1 | xargs -n100 head -c1 >/dev/null

which will run 100 files through head at a time.

Looks like this and more is covered inhttp://www.gluster.org/docs/index.php/Understanding_AFR_Translator,which is probably more complete than anything I've mentioned.

Also the log file is very verbose about the downed server. There arelots of messages like:
2008-12-09 11:18:08 E [tcp-client.c:190:tcp_connect] brick2:non-blocking connect() returned: 111 (Connection refused)2008-12-09 11:18:08 W [client-protocol.c:332:client_protocol_xfer]brick2: not connected at the moment to submit frame type(1) op(34)2008-12-09 11:18:08 E [client-protocol.c:4430:client_lookup_cbk] brick2:no proper reply from server, returning ENOTCONN2008-12-09 11:18:08 E [tcp-client.c:190:tcp_connect] brick2:non-blocking connect() returned: 111 (Connection refused)2008-12-09 11:18:08 W [client-protocol.c:332:client_protocol_xfer]brick2: not connected at the moment to submit frame type(1) op(9)2008-12-09 11:18:08 E [client-protocol.c:2787:client_chmod_cbk] brick2:no proper reply from server, returning ENOTCONN
In some of my tests I'm seeing several hundred a second logged. Isthere some way to make this a bit less verbose?
I'm sorry if these are FAQ, but I've so far been unable to find anythingon the wiki or mailing lists.
Thanks in advance for you help and this great project.


IMO better whiney until taken care of than quiet and missed...


--

-Kevan Benson
-A-1 Networks

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] Question about afr/self-heal, Brian Hirt, 2008/12/09
- Re: [Gluster-devel] Question about afr/self-heal, Kevan Benson <=

Prev by Date: Re: [Gluster-devel] about glusterfs--mainline--3.0--patch-717
Next by Date: Re: [Gluster-devel] Migration from Unify to DHT
Previous by thread: [Gluster-devel] Question about afr/self-heal
Next by thread: Re: [Gluster-devel] Migration from Unify to DHT
Index(es):
- Date
- Thread