gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] GlusterFS AFR not failing over


From: gordan
Subject: Re: [Gluster-devel] GlusterFS AFR not failing over
Date: Wed, 11 Jun 2008 15:05:58 +0100 (BST)
User-agent: Alpine 1.10 (LRH 962 2008-03-14)

Will do if I catch it at the time. It only happened twice in the last month or so. I'm not even sure if the problem that causes the lock-up is in the client or the server, as the only way I've managed to get it going again in both cases was by restarting both.

Gordan

On Wed, 11 Jun 2008, Krishna Srinivas wrote:

Gordan,

So glusterfs/glusterfsds hang when one of the servers go down
and they dont recover. At the point when it hangs, can you
attach gdb to the processes and get bt of them? (glusterfs
and glusterfsds)

Are you using 1.3.* release? non-blocking read/write fixes
have gone in 1.4.* release, where I think this behavior might be
fixed. The backtrace will help.

Thanks
Krishna

On Mon, Jun 9, 2008 at 7:11 PM,  <address@hidden> wrote:
No - this is a different problem. If the transport timeout was the problem,
the access should return after < 60 seconds, should it not? In the case I'm
seeing, something goes wrong and the only way to recover is to restart
glusterfsd on the server(s) _AND_ glusterfs on the clients.

It's kind of hard to reproduce, as I only see it happening about once every
week or so.

Gordan

On Sat, 7 Jun 2008, Krishna Srinivas wrote:

Gordon,

Is this the case of transport-timeout being high?

Krishna

On Sat, Jun 7, 2008 at 1:04 AM, Gordan Bobic <address@hidden> wrote:

Hi,

I have /home mounted from GlusterFS with AFR, and if one of the servers
(secondary) goes away, I cannot log in. sshd tries to read ~/.ssh and
bash
tries to read ~/.bashrc and this seems to fail - or at least take a very
long time to time out and try the remaining server (which verifiably
works).

I get this sort of thing in the logs:

E [tcp-client.c:190:tcp_connect] home2: non-blocking connect() returned:
110
(Connection timed out)
E [client-protocol.c:4423:client_lookup_cbk] home2: no proper reply from
server, returning ENOTCONN
C [client-protocol.c:212:call_bail] home2: bailing transport

where home2 is the name of the GlusterFS export on the secondary.

Is this a known issue or have I managed to trip another error case?

Gordan


_______________________________________________
Gluster-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/gluster-devel




_______________________________________________
Gluster-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/gluster-devel






reply via email to

[Prev in Thread] Current Thread [Next in Thread]