gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Re; Load balancing ...


From: Gordan Bobic
Subject: Re: [Gluster-devel] Re; Load balancing ...
Date: Fri, 25 Apr 2008 21:40:00 +0100
User-agent: Thunderbird 1.5.0.12 (X11/20080403)

Gareth Bult wrote:
If you have two nodes and the 20 GB file
only got written to node A while node B was down and
node B comes up the whole 20 GB is resynced to node B;
is that more network usage than if the 20 GB file were
written immediately to node A & node B.

Ah. Let's say you have both nodes running with a 20Gb file synced.
Then you have to restart one glusterfs on one of the nodes.
While it's down, let's say the other node appends 1 byte to the file.
When it comes back up and looks a the file, the other node will see it's out of 
date and re-copy the entire 20Gb.

You're expecting a bit much here - for any shared/clustered FS. DRBD might come close if your extents are big enough, but that's a whole different ball game...

Perhaps the issue is really that the cost comes at an
unexpected time, on node startup instead of when the
file was originally written?  Would a startup
throttling mechanism help here on resyncs?

Yes, unfortunately you can't open a file while it's syncing .. so when you 
reboot your gluster server, downtime is the length of time it takes to restart 
glusterfs (or the machine, either way..) PLUS the amount of time it takes to 
recopy every file that was written to while one node was down ...

Sounds like a reasonably sane solution to me.

Take a Xen server for example serving disk images off a gluster partition.
10 Images at 10G each gives you a 100G copy to do.

If they are static images why would they have changed? What you are describing would really be much better accomplished with a SAN+GFS or Coda which is specifically designed to handle disconnected operation at the expense of other things.

Wait, it get's better .. it will only re-sync the file on opening, so you 
actually have to close all the files, then try to re-open them , then wait 
while it re-syncs the data (during this time your cluster is effectively down), 
then the file open completes and you are back up again.

Why would the cluster effectively be down? Other nodes would still be able to server that file. Or are you talking about the client-side AFR? I have to say, a one-client/multiple-servers scenario sounds odd. If you don't care about downtime (you have just one client node so that's the only conclusion that can be reached), then what's the problem with a bit more downtime?

Yet there is a claim in the FAQ that there is no single point of failure .. yet 
to upgrade gluster for example you effectively need to shut down the entire 
cluster in order to get all files to re-sync ...

Wire protocol incompatibilities are, indeed unfortunate. But on one hand you speak of manual failover and SPOF clients and on the other you speak of unwanted downtime. If this bothers you, have enough nodes that you could shut down half (leaving half running), upgrade the downed ones, bring them up and migrade the IPs (heartbeat, RHCS, etc) to the upgraded ones and upgrade the remaining nodes. The downtime should be seconds at most.

Effectively storing anything like a large file on AFR is pretty unworkable and 
makes split-brian issues pale into insignificance ... or at least that's my 
experience of trying to use it...

I can't help but think that you're trying to use the wrong tool for the job here. A SAN/GFS solution sounds like it would fit your use case better.

Gordan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]