Re: [Gluster-devel] Re; Load balancing ...

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Re; Load balancing ...

From:	gordan
Subject:	Re: [Gluster-devel] Re; Load balancing ...
Date:	Thu, 1 May 2008 18:38:57 +0100 (BST)
User-agent:	Alpine 1.10 (LRH 962 2008-03-14)

On Thu, 1 May 2008, Martin Fick wrote:

Sounds like a good idea. The next question is where
to keep the log. 1 log per file? 1 log per
directory?
How to store them? Shadow files? Separate
shadow volume? A shadow volume might be a good idea
because it keeps the  main source mounted directory
exactly the same as a normal directory.


I would start as simple as possible and adapt as
necessary if you run into a performance problem.  The
simplest design would probably be a shadow volume with
one log per file with the a sparse mirrored directory
structure.

Indeed, that's exactly what I was thinking. You would effectively need acontainer, like a namespace, to unify the two.

Logs could be 24(?) bytes concatenated one
after another making appending easy and reliable.  Or
at a minor space cost (but potential added
portability/extendability), each log file could even
be a colon delimited line based ascii file (please
don't anyone suggest an xml file!)

 version1:start2:span2
 version2:start2:span2
 ...

If it's fixed length pointers (or in fact fixed length records), I'd gowith packed binary format for efficiency and speed. These will have to bewritten to on every write. There would also need to be a header thatstates where the roll-over point is. Effectively, the log would be an RRD.

Having a separate log file for each real file also
makes it easy to code up some optimizations, for
example: it would be easy to lookup the size of the
log and the size of the real file.  As soon as the log
becomes bigger than the real file it is no longer
worth keeping as is!  It also makes it real easy to
just delete the log if the real file is deleted.

Maybe have the default log be about 0.5% of the file in powers of 2, andnot used for files below a certain size. Maybe grow/shrink it when itwould exceed one step in powers of 2 from it's intended 0.5% size. Thiswould mean that as the file grows, the log increases, but the logextention gets exponentially more rare. log truncation could be left untila suitable roll-over point. If you are syncing inodes, then that istypically 4KB, and a log entry would be, as you said, 24 bytes. That makesa log entry for a changed inode block about 1/170, which is about rightfor the 0.5% ball park.

Another nice optimizer could make intelligent
decisions about which log files to delete when the
shadow volume starts to fill up.  By simply examining
the size of each log versus the size of the real file
one can set an upper bounds on how much transfer data
the log could be saving (a real estimate would require
adding all the spans together in the log file taking
into account overlapping sections).

Sure - if you want to keep volumes separate. Or you could just maue surethat your log volume is always at least 1/170 of the data volume it'sshadowing. Possibly a bit more for a safety margin with the lazy logresizing - around 1% ought to suffice for most sane cases.

Finally, it would
allow an admin to prune the shadow volume manually of
whichever logs he chooses to prune.  An ascii file
would make it easy to script various pruners.

I think that starts getting potentially dangerous. I think just having thelogs volume at about 1% of the data volume would be better. Of course, ifyou keep both on the same physical volume, it won't matter.

It would be nice to design the shadow volume so that
it can be removed from the picture at any time without
corrupting anything.

You already covered that with the sparse shadow volume tree. If there's nolog, you resync the whole file.

It would also be nice to ensure
that the journal translator can handle an out of space
condition.  This way each server is not required to
even have the same size journal volume if any at all.

Note that this gets into the chicken and the egg problem - the log fileswould still need to be syncable directly using the current method - oryou'd need a journal for your journalling volume. But if the journal istypically < 1% of the file, that's probably cheap enough that it won'tmatter too much. You could also probably set the upper limit on the volumesize, because past a certain point the file changes will be limited bydisk speeds, so from there on a bigger file doesn't imply more log spaceis required.

A (shadow volume) log should, ideally, also keep
additional sanity check information such as file
metadata (timestamps, size) for cross-check of
whether something went weird and the file was
changed underneath GlusterFS, and if it has, flush
out the log and force a full resync on the file.


Hmm, this seems like an additional layer that might be
nice (and perhaps an XML log would be appropriate
here), but I would put it an separate inline
translator so that it is not required.  The nice part
is that if the protocol is extended to handle the
journal layer, adding another separate layer like this
would probably be easy!

For the sake of an extra few bytes in the log entry (8 byte time stamp + 8byte file size), I think it is probably worthwhile having it forcrosscheck.

Thanks again for your patience, I know it's not easy
listening to back seat designers :)


I second that apology. :-)

Gordan

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Gluster-devel] Re; Load balancing ..., Martin Fick, 2008/05/01
- Re: [Gluster-devel] Re; Load balancing ..., gordan, 2008/05/01
  - Re: [Gluster-devel] Re; Load balancing ..., Martin Fick, 2008/05/01
    - Re: [Gluster-devel] Re; Load balancing ..., gordan <=
    - Re: [Gluster-devel] Re; Load balancing ..., Amar S. Tumballi, 2008/05/01
    - Re: [Gluster-devel] Re; Load balancing ..., Krishna Srinivas, 2008/05/01

Prev by Date: Re: [Gluster-devel] Client side afr versus server side, doing a self-heal
Next by Date: Re: [Gluster-devel] Client side afr versus server side, doing a self-heal
Previous by thread: Re: [Gluster-devel] Re; Load balancing ...
Next by thread: Re: [Gluster-devel] Re; Load balancing ...
Index(es):
- Date
- Thread