gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] GlusterFS hangs/fails: Transport endpoint is not con


From: Joe Landman
Subject: Re: [Gluster-devel] GlusterFS hangs/fails: Transport endpoint is not connected
Date: Tue, 25 Nov 2008 09:27:46 -0500
User-agent: Thunderbird 2.0.0.17 (X11/20080925)

Fred Hucht wrote:
Hi,

crawling through all /var/log/messages, I found on one of the failing nodes (node68)

Does your setup use local disk? Is it possible that the backing store is failing?

If you run

        mcelog > /tmp/mce.log 2>&1

on the failing node, do you get any output in /tmp/mce.log ?

My current thoughts in no particular order are

hardware based: failures always concentrated on a few specific nodes (always repeatable only on those nodes)

a) failing local hard drive: backing store failing *could* impact the file system, and you would see this as NFS working on a remote FS while failing on an FS in part storing locally.

b) network issue: possibly a bad driver/flaky port/overloaded switch backplane. This is IMO less likely, as NFS works. Could you post output of "ifconfig" so we can look for error indicators in the port state?

Software based:

c) fuse bugs: I have run into a few in the past, and they have caused errors like this. But umount/mount rarely fixes a hung fuse process, so this is, again, IMO, less likely.

d) GlusterFS bugs: I think the devels would recognize it if it were one. I doubt this at this moment.

e) kernel bug: We are using 2.6.27.5 right now, about to update to .7 due to some Cert advisories. We have had (stability) issues with kernels from 2.6.24 to 2.6.26.x (x low numbers) under intense loads. It wouldn't surprise me if what you are observing is actually just a symptom of a real problem somewhere else in the kernel. That the state gets resolved when you umount/mount suggests that this could be the case.

Joe




--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: address@hidden
web  : http://www.scalableinformatics.com
       http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615




reply via email to

[Prev in Thread] Current Thread [Next in Thread]