gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] ping timeout


From: Christopher Hawkins
Subject: Re: [Gluster-devel] ping timeout
Date: Thu, 18 Mar 2010 10:59:41 -0400 (EDT)

Thanks Stephan. But in my testing, I see the exact opposite. The hang is 
painful (everything stops) but the reconnect causes no problems at all. It 
seems to work great (good job on 3.0!) What kind of problems is it causing for 
you? Maybe there is something I am missing in my test setup. 

You mention that stopping and restarting glusterfsd on one box works out 
well... That is a reconnect, as far as I can tell. There is no hang because 
when you shut it down, the gluster client immediately gets a connection refused 
and doesn't wait for the timeout period:
[2010-03-18 10:04:46] E [socket.c:760:socket_connect_finish] master2: 
connection to 10.0.0.102:3302 failed (Connection refused)

As opposed to the server just going away, which hangs for a while:
[2010-03-18 10:05:44] E [client-protocol.c:415:client_ping_timer_expired] 
master2: Server 10.0.0.102:3302 has not responded in the last 42 seconds, 
disconnecting.

But when you start it up again, you should get reconnected quickly and with no 
problems:
[2010-03-18 09:00:00] N [afr.c:2625:notify] mirror1: Subvolume 'master1' came 
back up; going online.
[2010-03-18 09:00:00] N [client-protocol.c:6228:client_setvolume_cbk] master1: 
Connected to 10.0.0.101:3301, attached to remote volume 'threads2'.
  
Seems to me that disconnect / reconnect is only painful because ping timeout is 
so long... And on a high latency network, maybe you need that to avoid frequent 
little split brains, but on a low latency network, long ping timeouts seem to 
cause more problems than they fix. Or are you experiencing something that I am 
not? 

Christopher Hawkins

----- "Stephan von Krawczynski" <address@hidden> wrote:

> Hi Christopher,
> 
> I advise you to really try the most important part of your description
> you
> take for granted - the reconnect case.
> Our experiences are quite away from what you think is the worst case.
> You can
> easily check out what happens if you just pull the network cable 5
> times in 10
> minutes. We came to the conclusion that disconnect/reconnect should be
> avoided
> under all circumstances. Interestingly stopping one servers'
> glusterfsd and
> restarting it works out quite well in our setup. So offline-updating a
> server
> (which was our main purpose) is quite ok.
> 
> -- 
> Regards,
> Stephan
> 
> 
> 
> On Thu, 18 Mar 2010 08:33:51 -0400 (EDT)
> Christopher Hawkins <address@hidden> wrote:
> 
> > I have a question re: ping timeout for any of the dev's. The minimum
> value is 5 and the max is 1013... But in my case, I use replicate to
> mirror server pairs that are each gigabit connected by crossover
> cables. The latency is very low. 5 seconds is a long time and
> personally I would like them to give up on the failed link after 500ms
> or so, so the mountpoint becomes available quickly to the remaining
> node. 
> > 
> > Or I would at least like to test it and see if it's stable that way;
> I don't mind getting disconnected early in the case of a slow server,
> because it will just reconnect when the server comes back. Is there
> any hope for being able to tweak this parameter? Or is there a reason
> why it simply cannot be lower than 5?
> > 
> > Thanks for any insight and for glusterfs!
> > 
> > Christopher Hawkins
> > 
> > 
> > _______________________________________________
> > Gluster-devel mailing list
> > address@hidden
> > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >




reply via email to

[Prev in Thread] Current Thread [Next in Thread]