gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Improving real world performance by moving files clo


From: Gordan Bobic
Subject: Re: [Gluster-devel] Improving real world performance by moving files closer to their target workloads
Date: Fri, 16 May 2008 01:19:48 +0100
User-agent: Thunderbird 2.0.0.14 (Windows/20080421)

Luke McGregor wrote:

We are currently experimenting with running GLuster over the nodes in
the cluster to produce a single large filesystem. For my Honors
research project ive been asked to look into making some improvements
to GLuster to try to improve performance by moving the files within
the GLusterFS closer to the node which is accessing the file.

What i was wondering is basically how hard would it be to write code
to modify the metadata so that when a file is accessed it is then
moved to the node which it is accessed from and its location is
updated in the metadata.

So, you want a unify/AFR hybrid translator that keeps track of what nodes use what files most often, and migrate the file to that node? Perhaps a probabalistic local caching approach would do well with this. When a node accesses a file, there is a chance that it will replicate the file to local storage. If a node accesses a file repeatedly, the cumulative chance approaches unity. The problem is that you need some way of ensuring that files don't exist on more than XYZ nodes, and that when the store fills up, the file that gets dropped exists somewhere else, when you are dropping the least recently used file from a node.

Interesting enough idea, but I'm not sure if the book-keeping overheads would be overcome by speed benefits, especially on a fast network. You'd also not be able to route requests for a particular file easily, which might end up meaning a broadcast request to all nodes to establish who has the file available.

I suspect that designing an algorithm that does all this with sufficiently little overhead to keep you ahead in performance will be the most difficult part, not writing a GlusterFS plugin. You are almost looking at a variant of a probabalistically cached distributed hash table network, only without using hashes for routing (which makes it more difficult).

I'd _LOVE_ to see this done, though, it sounds like an awesome project. :)

Gordan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]