[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gluster-devel] Spurious disconnections / connectivity loss
From: |
Stephan von Krawczynski |
Subject: |
Re: [Gluster-devel] Spurious disconnections / connectivity loss |
Date: |
Sat, 30 Jan 2010 12:08:29 +0100 |
On Fri, 29 Jan 2010 18:41:10 +0000
Gordan Bobic <address@hidden> wrote:
> I'm seeing things like this in the logs, coupled with things locking up
> for a while until the timeout is complete:
>
> [2010-01-29 18:29:01] E
> [client-protocol.c:415:client_ping_timer_expired] home2: Server
> 10.2.0.10:6997 has not responded in the last 42 seconds, disconnecting.
> [2010-01-29 18:29:01] E
> [client-protocol.c:415:client_ping_timer_expired] home2: Server
> 10.2.0.10:6997 has not responded in the last 42 seconds, disconnecting.
>
> The thing is, I know for a fact that there is no network outage of any
> sort. All the machines are on a local gigabit ethernet, and there is no
> connectivity loss observed anywhere else. ssh sessions going to the
> machines that are supposedly "not responding" remain alive and well,
> with no lag.
What you're seeing here is exactly what made us increase the ping-timeout to
120.
To us it is obvious that the keep alive strategy does not cope with minimal
packet loss. On _every_ network you can see packet loss (read the docs of your
switch carefully). We had the impression that the strategy implemented is not
aware of the fact that a lost ping packet is no proof for a disconnected
server but only a hint for a closer look.
--
Regards,
Stephan