gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Choice of Translator question


From: Kevan Benson
Subject: Re: [Gluster-devel] Choice of Translator question
Date: Thu, 27 Dec 2007 12:16:53 -0800
User-agent: Thunderbird 2.0.0.9 (X11/20071031)

Gareth Bult wrote:
Agreed, which is why I just showed the single file self-heal
method, since in your case targeted self heal (maybe before a full
filesystem self heal) might be more useful.

Sorry, I was mixing moans .. on the one hand there's no log hence no
automatic detection of out of date files (which means you need a
manual scan), and secondly, doing a full self-heal on a large
file-system "can" be prohibitively "expensive" ...

I'm vaguely wondering if it would be possible to have a "log"
translator that wrote changes to a namespace volume for quick
recovery following a node restart. (as an option of course)

An interesting thought. Possibly something that keeps a filename and timestamp so other AFR members could connect and request changed file AFR versions since X timestamp.

Automatic self-heal is supposed to be on the way, so I suspect they are already doing (or planning) something like this.

I don't see how the AFR could even be aware the chunks belong to
the same file, so how it would know to replicate all the chunks of
a file is a bit of a mystery to me.  I will admit I haven't done
much with the stripe translator though, so my understanding of it's
operation may wrong.

Mmm, trouble is there's nothing definitive in the documentation
either way .. I'm wondering whether it's a known critical omission
which is why it's not been documented (!) At the moment stripe is
pretty useless without self-heal (i.e. AFR). AFR is pretty useless
without stripe for anyone with large files. (which I'm guessing is
why stripe was implemented after all the "stripe is bad"
documentation) If the the two don't play well and a self-heal on a
large file means a 1TB network data transfer - this would strike me
as a show stopper.

I think the original docs said it was implemented because it was easy, but there wasn't a whole lot to be gained by using it. Since then, I've seen people post numbers that seemed to indicate it gave a somewhat sizable boost, but the extra complexity in introduced never made it attractive to me.

The possibility it could be used to greatly speed up self-heal on large files seems like a real good reason to use it though, so hopefully we can find a way to make it work.

Understood.  I'll have to actually try this when I have some time,
instead of just doing some armchair theorizing.

Sure .. I think my tests were "proper" .. although I might try them
on TLA just to make sure.

Just thinking logically for a second, for AFR to do chunk level
self-heal, there must be a chunk level signature store somewhere. ...
where would this be ?

Well, to AFR each chunk should just look like another file, it shouldn't care that it's part of a whole.

I assume the stripe translator uses another extended attribute to tell what file it's part of. Perhaps the AFR translator is stripe aware and that's causing the problem?

Was this on AFR over stripe or stripe over AFR?

Logic told me it must be AFR over stipe, but I tries it both ways
round ..

Let get rid of the over/under terminology (which I always seem to think of reverse from other people), and use a representation that's more absolute:

client -> XLATOR(stripe) -> XLATOR(AFR) -> diskVol(1..N)

Throw in your network connections wherever you want, but this should be testable on a single box with two different directories exported as volumes.

The client writes to the stripe translator, which splits up the large file, which is then sent to the AFR translator so each chunk is stored redundantly in each disk volume supplied.

If the AFR and stripe are reversed, it will have to pull all stripe chunks to do a self heal (unless AFR is stripe aware), which isn't what we are aiming for.

Is that similar to what you tested?

--

-Kevan Benson
-A-1 Networks




reply via email to

[Prev in Thread] Current Thread [Next in Thread]