gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] Killing glusterfd on server for fun


From: Emmanuel Dreyfus
Subject: [Gluster-devel] Killing glusterfd on server for fun
Date: Mon, 18 Jun 2012 09:45:07 +0000
User-agent: Mutt/1.5.21 (2010-09-15)

Hi 

I am stil testing high availability on release-3.3. I have a 2 replica
setup, and it is busy building NetBSD [1]. If I kill glusterfsd on one server, 
leaving the other one alive, it causes all ongoing operations to fail, 
and the mount is wrecked beyond recovery:

client$ ls
ls: .: Socket is not connected
client$ cd /
client$ cd -
/bin/ksh: cd: /pfs/manu/netbsd/usr/src - Socket is not connected

I have to restart the stopped brick in order to use the mount again.
I tried reproducing the issue with something simplier than a huge build,
without success so far: the mount survives a brick being stopped, and
I do not even have failures on ongoing operations. I would appreciate
some hint on how to reproduce the problem in a simple test case. 

Here is the begging of client log at failure time:
[2012-06-18 09:14:24.421370] W [socket.c:1512:__socket_proto_state_machine] 
0-pfs-client-0: reading from socket failed. Error (Socket is not connected), 
peer (193.54.82.99:24010)
[2012-06-18 09:14:24.443002] E [rpc-clnt.c:373:saved_frames_unwind]  
0-pfs-client-0: forced unwinding frame type(GlusterFS 3.1) op(INODELK(29)) 
called at 2012-06-18 09:14:23.516562 (xid=0x9483261x)
[2012-06-18 09:14:24.443448] W [client3_1-fops.c:1495:client3_1_inodelk_cbk] 
0-pfs-client-0: remote operation failed: Socket is not connected
[2012-06-18 09:14:24.443660] I [afr-lk-common.c:1006:afr_lock_blocking] 
0-pfs-replicate-0: unable to lock on even one child
[2012-06-18 09:14:24.443735] I 
[afr-transaction.c:994:afr_post_blocking_inodelk_cbk] 0-pfs-replicate-0: 
Blocking inodelks failed.
[2012-06-18 09:14:24.443903] W [fuse-bridge.c:788:fuse_setattr_cbk] 
0-glusterfs-fuse: 7435303: SETATTR() 
/manu/netbsd/usr/src/destdir.i386/usr/include/sys/featuretest.h => -1 (Socket 
is not connected)
[2012-06-18 09:14:24.445857] I [socket.c:2315:socket_submit_request] 
0-pfs-client-0: not connected (priv->connected = 0)
[2012-06-18 09:14:24.446190] W [rpc-clnt.c:1498:rpc_clnt_submit] 
0-pfs-client-0: failed to submit rpc-request (XID: 0x9483576x Program: 
GlusterFS 3.1, ProgVers: 330, Proc: 41) to rpc-transport (pfs-client-0)
[2012-06-18 09:14:24.446665] E [rpc-clnt.c:373:saved_frames_unwind]  
0-pfs-client-0: forced unwinding frame type(GlusterFS 3.1) op(INODELK(29)) 
called at 2012-06-18 09:14:23.517840 (xid=0x9483266x)
[2012-06-18 09:14:24.447066] W [client3_1-fops.c:1495:client3_1_inodelk_cbk] 
0-pfs-client-0: remote operation failed: Socket is not connected
[2012-06-18 09:14:24.447364] I [afr-lk-common.c:1006:afr_lock_blocking] 
0-pfs-replicate-0: unable to lock on even one child
[2012-06-18 09:14:24.447666] I 
[afr-transaction.c:994:afr_post_blocking_inodelk_cbk] 0-pfs-replicate-0: 
Blocking inodelks failed.
[2012-06-18 09:14:24.448142] W [fuse-bridge.c:788:fuse_setattr_cbk] 
0-glusterfs-fuse: 7435323: SETATTR() 
/manu/netbsd/usr/src/destdir.i386/usr/include/sys/featuretest.h => -1 (Socket 
is not connected)


 
[1] for anyone willing to reproduce, get and unpack
gnusrc.tgz src.tgz syssrc.tgzsharesrc.tgz from
ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-5.1.2/source/sets/
Then: cd usr/src && ./build.sh -Uuo release

-- 
Emmanuel Dreyfus
address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]