gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] DHT idea: rebalance-specific layout


From: Paul Cuzner
Subject: Re: [Gluster-devel] DHT idea: rebalance-specific layout
Date: Tue, 25 Mar 2014 18:57:53 -0400 (EDT)


It's an interesting way to avoid data movement, would this be an optional method for rebalance processing or a default?

You also mention medium bricks and old bricks - which implies meta data associated with the bricks themselves that can be referenced by the rebalance algorithm?

By avoiding the redistribution of the data, don't we end up placing more of the 'hot' content on fewer servers. I've always liked the notion of the capacity being consumed by data of varying ages within a brick to avoid hot spots within the cluster.


From: "Jeff Darcy" <address@hidden>
To: "Gluster Devel" <address@hidden>
Sent: Tuesday, 25 March, 2014 3:08:49 AM
Subject: [Gluster-devel] DHT idea: rebalance-specific layout

I was talking to a user about my size-weighted (or optionally
free-space-weighted) rebalance script.  This led to thinking about ways
to bring a system back into balance without migrating any old data, as
some of our users already do.  Here's the example I was using.

* Four existing 1TB bricks, which are 90% full.

* One new 2TB brick, which is empty.

Therefore, total free space is 2.4TB, of which the new brick has 2.0TB.
 If we set up the layouts so that the new brick has 5/6 of the hash
space then as new files are added they should all reach 100% full at
the same time without ever needing to migrate any old data.  Yay.

Unfortunately, there's still a problem.  For these kinds of users (e.g.
CDNs) the newest data also tends to remain hottest.  What happens when
they want to retire some of their oldest hardware?  That *does* involve
migrating old data, and the load for that will disproportionately fall
on the newest servers which really should be spending as much of their
time as possible serving new content.  That's not good.

So (finally) here's the idea.  Have a *separate* set of layout values
that are used specifically for rebalance, so that we can rebalance data
one way even as new files are placed another way.  Let's consider a
slightly different example.

* 4 ancient 1TB bricks, 75% full

* 16 medium-age 1.5TB bricks, also 75% full

* 4 new 2TB bricks, empty

Here's one possible way to use dual layouts:

* currently 8TB free on the medium bricks, goal is 5TB

* 4TB free on the new bricks

* set regular layout to 44% new, 55% medium

* set rebalance layout to 100% medium

This way 44% of the new files but *none* of the files from the oldest
bricks will flow toward the newest bricks.  100% of that traffic will
be from the oldest bricks to the medium ones, and shouldn't affect the
newest machines at all.  This would all be a lot easier if we had
layout inheritance or default layouts instead of every single directory
with its own layout, but we can probably find ways to deal with that.

Any reactions?




_______________________________________________
Gluster-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/gluster-devel


reply via email to

[Prev in Thread] Current Thread [Next in Thread]