gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Re: [bug #19614] System crashes when node fails even


From: Brent A Nelson
Subject: Re: [Gluster-devel] Re: [bug #19614] System crashes when node fails even with xfr
Date: Fri, 11 May 2007 15:37:54 -0400 (EDT)

On Sat, 12 May 2007, Anand Avati wrote:

Brent,
you have observed the reconnection logic right. This effect has
'creeped in' after introducing the non blocking tcp connect
functionality, which, pushes connect to the background if it took more
than N usecs, (the current I/O request is returned failed if the
connect() dint succeed in that shot). by the time the second I/O
request comes the connect would have succeeded and the call goes
through.

this can be 'fixed' by turning the "N usecs" (currently hardcoded in
the code, but I want to make it configurable from the spec soon) in
the transport code. but the flip side of makeing this "N" large is
that if the server is really dead for a long time, all I/O on the dead
transport will be blocked for that period, which can be accumulate to
be quite an inexperience.


Cool. I agree that the time should be quite short (in case nodes are still down, that gives you access to what is available without a delay for each and every request), but it would be nice that it waits a minimal period for a reconnect to work. User-configurable would be nice. It would help in my mysterious disconnect case (where all machines are running fine, it's just that the client/server briefly disconnect, disrupting the current I/O). It could also help on bad network links. It's probably not that important in real disconnect cases, though, where a machine may be down or rebooting.

But then, that's not the end of it. Reconnection logic is being
redesigned where the reconnection is done proactively (not when I/O is
triggered) when a connection dies.

Sounds good.  Maybe both could work together?

Thanks,

Brent




reply via email to

[Prev in Thread] Current Thread [Next in Thread]