gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Blocking client when server is down


From: Martin Fick
Subject: Re: [Gluster-devel] Blocking client when server is down
Date: Tue, 30 Dec 2008 21:03:19 -0800 (PST)

--- On Tue, 12/30/08, Basavanagowda Kanur <address@hidden> wrote:

> If server is down for transport-timout time, then client
> returns all the calls with 'Transport Endpoint not connected'
> error.

Yes, this is exactly what I do not want.  I want reads/writes to simply block 
when the server is down and to complete (the blocked calls) when the server 
returns.  I do not want my applications to get an error, only a delay.  Without 
this it is not possible to recover gracefully from a server/network failure.

While we are at it, what is the timeout in, seconds, milliseconds?


I have been trying to understand what it would take to implement this feature 
in the client protocol translator.  At first thought, it seems like there are 
two main cases that would need to be dealt with, 1) requests which have not yet 
hit the wire, and fail when they attempt to, and 2) requests which have already 
hit the wire but have not been responded to by the server.  Possibly a third 
more complex case 3) would be requests which hit the server and were responded 
to, but the response was never received by the client.

The simplest case seems to be # 1, simply wait for the connection to 
reestablish itself and retry to submit the protocol to the wire.  I hacked a 
simple implementation of this (looping in protocol_client_xfer until the 
connection is reestablished without holding the lock) which seems to work, but 
I have no clue if it is correct. ;)  I will attach it below.

For # 2, it looks like the client protocol keeps a list of outstanding requests 
in the saved_frames list.  Is there any reason this list could not be 
resubmitted when the connection is reestablished instead of it being purged 
when the connection fails (apart from the problems associated with corner case 
#3)?  Is all the required data still in the frame at this point (before 
protocol_client_cleanup is called)?

Corner case # 3 seems like it would require the server to keep track of 
responses it knows did not reach the client.  If it can resend these responses 
to the client when the connection is reestablished, the client could process 
those requests without resending them.

This is my simplistic understanding of the problem.  Am I overlooking something 
major that would prevent this from working?  Is this something you would 
consider implementing or accepting patches for if I can get it to work 
(although it might be way beyond my abilities)?  Am I way off and wasting my 
time? :(

Thanks,

-Martin


--- xlators/protocol/client/src/client-protocol.c       2008-12-30 17
:24:34.000000000 -0700
+++ xlators/protocol/client/src/client-protocol.c.orig  2008-12-30 13
:23:26.000000000 -0700
@@ -388,7 +388,6 @@
        gf_hdr_common_t rsphdr = {0, };
        client_forget_t forget = {0, };
        uint8_t send_forget = 0;
-        uint8_t  reconnect = 1;

        priv  = this->private;
        trans = priv->transport;
@@ -431,32 +430,14 @@
                        hdr->req.pid = hton32 (frame->root->pid);
                }

-               if(type == GF_OP_TYPE_MOP_REQUEST &&
-                  op == GF_MOP_SETVOLUME)
-                       reconnect = 0;
-
-               while(1) {
-                       if (cprivate->connected == 0)
-                               transport_connect (trans);
-
-                       if (cprivate->connected ||
-                           ((type == GF_OP_TYPE_MOP_REQUEST) &&
-                            (op == GF_MOP_SETVOLUME))) {
-                               ret = transport_submit (trans, (char *)hdr, 
hdrlen,
-                                                       vector, count, refs);
-                       }
-
-                       if (!reconnect || ret >= 0 || cprivate->connected > 0)
-                               break;
+               if (cprivate->connected == 0)
+                       transport_connect (trans);

-                       pthread_mutex_unlock (&cprivate->lock);
-                       while (cprivate->connected <= 0) {
-                               gf_log (this->name, GF_LOG_DEBUG,
-                                       "protocol_client_xfer waiting for 
connection(%i)",
-                                       cprivate->connected);
-                               sleep(1);
-                       }
-                       pthread_mutex_lock (&cprivate->lock);
+               if (cprivate->connected ||
+                   ((type == GF_OP_TYPE_MOP_REQUEST) &&
+                    (op == GF_MOP_SETVOLUME))) {
+                       ret = transport_submit (trans, (char *)hdr, hdrlen,
+                                               vector, count, refs);
                }

                if ((ret >= 0) && frame) {



      




reply via email to

[Prev in Thread] Current Thread [Next in Thread]