[Gluster-devel] Self Heal/Recovery Problem

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] Self Heal/Recovery Problem

From:	Kamil Srot
Subject:	[Gluster-devel] Self Heal/Recovery Problem
Date:	Mon, 15 Oct 2007 18:36:34 +0200
User-agent:	Thunderbird 2.0.0.6 (Windows/20070728)

Dear Gluster developers, fans,

as first thing, I want to say big THANK YOU for the work you do. Fromwhat I saw and tried out (OCFS2, GFS2, CODA, NFS), your system is thefirst one I like and in some way understand the logic behind it. Theothers seems to be too complex, hard to understand and possiblyreconfigure in case something wents wrong.I worked like 2 months with my test setup of OCFS (which is the simplest"other" solution of FS clustering) and dont have so nice feeling aboutit than after few days with GlusterFS...

Well, it wouldn't a good post into devel group w/o questions - so I'mcomposing in another window few questions regarding performance/tuningof my setup, but recently I run into issue.

I have quite simple setup with two servers doing mirror of data with afr*:2 and unify and io-threads...The setup worked fine for several days of stress testing but recently Ifound article recommending to use some format parameter of underlayingXFS filesystem...So I stopped glfs and glfsd on one of the servers and formatted thedevice... have created the exported directories and started the glfsd &glfs again... then I tried to kick start the self heal do remirror thetesting data fith the find -mountpoint -type f ... ops, the glfsdsegfaults after few seconds - in the log, I have:


The glfs is: mainline--2.5--patch-518

---------
got signal (11), printing backtrace
---------
[0xb7f7f420]
/cluster/lib/glusterfs/1.3.5/xlator/protocol/client.so[0xb7604432]
/cluster/lib/glusterfs/1.3.5/xlator/protocol/client.so[0xb7606a4b]
/cluster/lib/glusterfs/1.3.5/xlator/protocol/client.so(notify+0xe5)[0xb7607666]
/cluster/lib/libglusterfs.so.0(transport_notify+0x62)[0xb7f70a92]
/cluster/lib/libglusterfs.so.0[0xb7f712fc]
/cluster/lib/libglusterfs.so.0(sys_epoll_iteration+0x16b)[0xb7f71642]
/cluster/lib/libglusterfs.so.0(poll_iteration+0x3b)[0xb7f70dce]
[glusterfsd](main+0x4e3)[0x804991d]
/lib/tls/libc.so.6(__libc_start_main+0xc8)[0xb7e27ea8]
[glusterfsd][0x8048e51]
---------

And core file in root directory... the backtrace is:
#0  0xb75574f8 in afr_sync_ownership_permission ()
  from /cluster/lib/glusterfs/1.3.5/xlator/cluster/afr.so
#1  0xb7576432 in client_closedir_cbk ()
  from /cluster/lib/glusterfs/1.3.5/xlator/protocol/client.so
#2  0xb7578a4b in client_protocol_interpret ()
  from /cluster/lib/glusterfs/1.3.5/xlator/protocol/client.so

#3 0xb7579666 in notify () from/cluster/lib/glusterfs/1.3.5/xlator/protocol/client.so#4 0xb7edfa92 in transport_notify (this=0x8053848, event=1) attransport.c:154

#5  0xb7ee02fc in epoll_notify (eevent=1, data=0x8053848) at epoll.c:53
#6  0xb7ee0642 in sys_epoll_iteration (ctx=0xbfb026d4) at epoll.c:155
#7  0xb7edfdce in poll_iteration (ctx=0xbfb026d4) at transport.c:300
#8  0x0804991d in main ()

It seems to be some problem with permissions?

Any hints/help is greatly appreciated!

*glusterfs-server.vol*
volume mailspool-ds
   type storage/posix
   option directory /data/mailspool-ds
end-volume

volume mailspool-ns
   type storage/posix
   option directory /data/mailspool-ns
end-volume

volume mailspool-san1-ds
   type protocol/client
   option transport-type tcp/client
   option remote-host 10.0.0.110
   option remote-subvolume mailspool-ds
end-volume

volume mailspool-san1-ns
   type protocol/client
   option transport-type tcp/client
   option remote-host 10.0.0.110
   option remote-subvolume mailspool-ns
end-volume

volume mailspool-ns-afr
   type cluster/afr
   subvolumes mailspool-ns mailspool-san1-ns
   option replicate *:2
end-volume

volume mailspool-ds-afr
   type cluster/afr
   subvolumes mailspool-ds mailspool-san1-ds
   option replicate *:2
end-volume

volume mailspool-unify
   type cluster/unify
   subvolumes mailspool-ds-afr
   option namespace mailspool-ns-afr
   option scheduler random
end-volume
volume mailspool
   type performance/io-threads
   option thread-count 8
   option cache-size 64MB
   subvolumes mailspool-unify
end-volume

volume server
   type protocol/server
   option transport-type tcp/server
   subvolumes mailspool
   option auth.ip.mailspool-ds.allow 10.0.0.*,127.0.0.1
   option auth.ip.mailspool-ns.allow 10.0.0.*,127.0.0.1
   option auth.ip.mailspool.allow *
end-volume

*glusterfs-client.vol
*volume client
   type protocol/client
   option transport-type tcp/client
   option remote-host 127.0.0.1
   option remote-subvolume mailspool
end-volume

volume writebehind
   type performance/write-behind
   option aggregate-size 131072 # aggregate block size in bytes
   subvolumes client
end-volume

volume readahead
   type performance/read-ahead
   option page-size 131072
   option page-count 2
   subvolumes writebehind
end-volume

volume iothreads    #iothreads can give performance a boost
   type performance/io-threads
   option thread-count 8
   option cache-size 64MB
   subvolumes readahead
end-volume*
*

Best Regards,
--
Kamil

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] Self Heal/Recovery Problem, Kamil Srot <=
- Re: [Gluster-devel] Self Heal/Recovery Problem, Krishna Srinivas, 2007/10/17

Prev by Date: Re: [Gluster-devel] Daemon dying, I killed it! <sob>
Next by Date: Re: [Gluster-devel] AFR over NUFA with 1.3.5
Previous by thread: [Gluster-devel] GlusterFS P2P Cluster with Auto Healing tutorial updated!
Next by thread: Re: [Gluster-devel] Self Heal/Recovery Problem
Index(es):
- Date
- Thread