gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Spurious disconnect in 3.4.0alpha


From: krish
Subject: Re: [Gluster-devel] Spurious disconnect in 3.4.0alpha
Date: Fri, 01 Mar 2013 11:08:16 +0530
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 03/01/2013 10:34 AM, Joe Julian wrote:
0-gfs33-client-2 would be the third brick in the gfs33 volume, so should be glusterfsd rather than glusterd, so not port 24007.
1) Client xlators first connect to glusterd on the remote-host, supplied in their options.
2) Query for the brick process' port (identified by brick's path).
3) Reconfigure the rpc object to connect to the brick process on the remote-host using the port received.
    This is when the client xlator connects to the glusterfsd (or the brick process) on the remote-host.


thanks,
krish

krish <address@hidden> wrote:
Hi Emmanuel,

On 03/01/2013 07:55 AM, Emmanuel Dreyfus wrote:
Hi The spurious disconnect I encountered in 3.4 branch still happen in 3.4.0alpha, but glusterfs recovers much better now. However, when running huge tar -xzf I still hit operation failures, after which everything is restored to normal state. Here is the client log, in which the issue is hit at 18:06:36 http://ftp.espci.fr/shadow/manu/client.log The relevant part is below. I understand glusterfs is able to restore its connections and everything works fine, except when it happens on all volumes simultaneously. [2013-02-28 18:06:36.105271] W [socket.c:1962:__socket_proto_state_machine] 0-gfs33-client-3: reading from socket failed. Error (No message available), peer (192.0.2.98:49153) [2013-02-28 18:06:36.105340] E [rpc-clnt.c:368:saved_frames_unwind] 0-gfs33-client-3: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2013-02-28 18:06:36.104358 (xid=0x3728220x) [2013-02-28 18:06:36.105454] W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gfs33-client-3: remote operation failed: Socket is not connected. Path: /manu/netbsd/usr/src/external (6fb65713-062a-464d-a9d4-e97dab3c298b) [2013-02-28 18:06:36.105514] E [rpc-clnt.c:368:saved_frames_unwind] 0-gfs33-client-3: forced unwinding frame type(GlusterFS 3.3) op(RELEASE(41)) called at 2013-02-28 18:06:36.104843 (xid=0x3728221x) [2013-02-28 18:06:36.105537] I [client.c:2097:client_rpc_notify] 0-gfs33-client-3: disconnected [2013-02-28 18:06:36.105571] E [afr-common.c:3761:afr_notify] 0-gfs33-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.[2013-02-28 18:06:36.112037] I [afr-common.c:3882:afr_local_init] 0-gfs33-replicate-1: no subvolumes up
I see that 0-gfs33-client-2 xlator is unable to connect to glusterd (that should be) running on hotstuff:24007. The client xlator attempts to reconnect every 3s since last attempt. This is why we see logs about client disconnection repeat. Could you check if glusterd was running on the host "hotstuff", when the client experiences spurious disconnects? To confirm this when you notice the 'spurious' disconnects, try # telnet hotstuff 24007 thanks, krish
Gluster-devel mailing list address@hidden https://lists.nongnu.org/mailman/listinfo/gluster-devel


reply via email to

[Prev in Thread] Current Thread [Next in Thread]