gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Sick but still "alive" nodes


From: Jeff Darcy
Subject: Re: [Gluster-devel] Sick but still "alive" nodes
Date: Fri, 25 Jan 2013 08:43:41 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 01/25/2013 07:47 AM, address@hidden wrote:
Hi guys: I just saw an issue on the Hfds mailing list that might be a
potential problem in gluster clusters.  It kind of reminds me of
Jeff's idea of bricks as first class objects in the API.

What happens if a gluster brick is on a machine which, although still
alive, performs poorly?

would such scenarios be detected and if so, can the brick be
decommissioned/ignored/moved ? If not it would be a cool feature to
have because I'm sure it happens from time to time.

There's nothing currently in place to detect such a condition, and of course if we can't detect it we can't do anything about it. There are also several cases where we might actually manage to make things worse if we try to do this ourselves. For example, consider the case where the slowness is because of a short-duration contending activity. We might well react just as that activity subsides, suspending that brick just as another brick is "going bad" due to similar transient activity there. Similarly, if the system overall is truly overloaded, suspending bricks is a bit like squeezing a water balloon - the "bulge" just reappears elsewhere and all we've done is diminish total resources available.

I've seen problems like this with other parallel filesystems, and I'm pretty sure I've read papers about them too. IMO the right place to deal with such issues is at the job-scheduler or similar level, where more of the total system state is known. What we can do is provide more information about our part of the system state, plus levers that they can pull when they decide that preparation or correction for a higher-level event (that we probably don't even know about) is appropriate.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]