Re: [Gluster-devel] Sick but still "alive" nodes

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Sick but still "alive" nodes

From:	Jeff Darcy
Subject:	Re: [Gluster-devel] Sick but still "alive" nodes
Date:	Fri, 25 Jan 2013 08:43:41 -0500
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 01/25/2013 07:47 AM, address@hidden wrote:

Hi guys: I just saw an issue on the Hfds mailing list that might be a
potential problem in gluster clusters.  It kind of reminds me of
Jeff's idea of bricks as first class objects in the API.

What happens if a gluster brick is on a machine which, although still
alive, performs poorly?

would such scenarios be detected and if so, can the brick be
decommissioned/ignored/moved ? If not it would be a cool feature to
have because I'm sure it happens from time to time.

There's nothing currently in place to detect such a condition, and ofcourse if we can't detect it we can't do anything about it. There arealso several cases where we might actually manage to make things worseif we try to do this ourselves. For example, consider the case wherethe slowness is because of a short-duration contending activity. Wemight well react just as that activity subsides, suspending that brickjust as another brick is "going bad" due to similar transient activitythere. Similarly, if the system overall is truly overloaded, suspendingbricks is a bit like squeezing a water balloon - the "bulge" justreappears elsewhere and all we've done is diminish total resourcesavailable.

I've seen problems like this with other parallel filesystems, and I'mpretty sure I've read papers about them too. IMO the right place todeal with such issues is at the job-scheduler or similar level, wheremore of the total system state is known. What we can do is provide moreinformation about our part of the system state, plus levers that theycan pull when they decide that preparation or correction for ahigher-level event (that we probably don't even know about) is appropriate.

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] Sick but still "alive" nodes, jayunit100, 2013/01/25
- Re: [Gluster-devel] Sick but still "alive" nodes, Jeff Darcy <=

Prev by Date: Re: [Gluster-devel] Undefined error: 0
Next by Date: [Gluster-devel] What functionality is expected from persistent NFS-client tracking?
Previous by thread: [Gluster-devel] Sick but still "alive" nodes
Next by thread: [Gluster-devel] Undefined error: 0
Index(es):
- Date
- Thread