gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] file version on glusterfs using libgit


From: Luis Pabon
Subject: Re: [Gluster-devel] file version on glusterfs using libgit
Date: Fri, 08 Mar 2013 12:22:50 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130219 Thunderbird/17.0.3

This sounds really interesting, but I do have some questions about Git (or any SCM) as a solution for file version support.

1. How well does Git handle large binary files like VM images? Does it keep a copy for each one, or does it keep diffs?
2. Does Git, or another SCM, allow for the deletion of older versions?
3. Can we this solution be used for VM linked clones? (I guess that would be like branching each one).

This is really interesting, because Brian F. and I were just discussing the pluses and minuses of a file version solution, but instead using QEMU's block driver technology, specifically either QCOW2 or QED (leaning more to QED).

Maybe what we are describing here is two different implementations for two different use cases.

File Versioning?:
1. Google Drive/Dropbox style file versions for small documents and files (still a question on binary deltas), where older versions are never deleted.
--> Solution: Git translator

File Snapshots?:
2. Snap support for small or large files which may require the deleting and/or merging of different versions. Specifically, satisfying APIs like OpenStack Cinder Snapshot API (available in Grizzly [1]) and linked virtual machine clones.
--> Possible Solution:  QEMU Block Technology (Still under investigation)

Also, I am not sure, but this type of translator could be better being at the client (behind DHT) than at the server (behind POSIX). I am still new to GlusterFS, but I am guessing that the .git repo (which is probably not be able to be seen by the client) would be handled by only one of the GlusterFS hosts. This could create a bottleneck. If instead, the xlator was at the client, then the files would be spread over the cluster (even the .git repo) by the DHT xlator. There may be a need to do some type of locking, but I am guessing GlusterFS already handles much of that. This issue parallels discussions Brian and I had around a QED based translator and how it would handle the IO for >100 linked cloned virtual machines.

But like I said above... Definitely very cool stuff.

- Luis

PS. another possible solution: **IF** we had a deduplicating backend (xlator or file system), then we could just make a copy (although it could be slow) and be done with it :-).

[1] https://wiki.openstack.org/wiki/Cinder

On 03/08/2013 06:20 AM, Niels de Vos wrote:
On Fri, Mar 08, 2013 at 06:00:24AM -0500, Shishir Gowda wrote:
Hi Niels,

Thinking out aloud, I think the snaps(in file version context) can be
displayed as branches (list).
Well, I am not sure if branches are really needed. Isn't linear history
sufficient? Every change should be committed to the master branch
anyway. Branches may be useful for switching between versions, but
nothing prevents you from checking out (or "git ls-files") with a comit
or a date.

I'm thinking of a virtual .snaps directory:

$ cd $VOLUME/.snaps
$ ls
2013-03-07/
2013-03-06/
.....
current/
changelog
yesterday -> 2013-03-07/

This makes it possible to do something like:
$ cat changelog
    - virtual file, showing the contents of 'git log'
    - find a commit you're interested in
$ mkdir $GIT_COMMIT_ID
$ ls $GIT_COMMIT_ID/
    - get the state just like 'git checkout $GIT_COMMIT_ID'

Maybe it would be helpful to be able to create tags inside this .snaps
directory. But I would refrain from branches for now (unless there is
a clear use-case).

Cheers,
Niels

Once the user cd's into any one of them, we could do a git checkout of the 
branch.

That should mimic the behaviour.

With regards,
Shishir

----- Original Message -----
From: "Shishir Gowda" <address@hidden>
To: "Niels de Vos" <address@hidden>
Cc: address@hidden
Sent: Friday, March 8, 2013 4:04:42 PM
Subject: Re: [Gluster-devel] file version on glusterfs using libgit

Hi Niels,

My inclination too is to load git ontop of posix xlator.

I was thinking of making previous versions (based on some policy) to be treated 
a new branch.

We could see how to export these branches as user visible dirs.

With regards,
Shishir

----- Original Message -----
From: "Niels de Vos" <address@hidden>
To: "Shishir Gowda" <address@hidden>
Cc: address@hidden
Sent: Friday, March 8, 2013 3:25:57 PM
Subject: Re: [Gluster-devel] file version on glusterfs using libgit

On Thu, Mar 07, 2013 at 12:54:41AM -0500, Shishir Gowda wrote:
Hi All,

Was playing around with git on glusterfs volume, to provide was of file version 
support.

And initial run is encouraging.

A brief overview what was tried:

Approach 1: Glusterfs volume as a git repo

1. created a 2 brick distribute volume
2. inited a git repo on fuse volume
3. created files, committed them in git.
4. Modified files, and committed them again
5. Did branch check-outs, to simulate versions @ point in time
6. reset branch heads, and was able access older version of files (after a 
stash).
7. Was able to create files/dirs/symlinks/hardlinks
8. Both NFS/FUSE clients were used.

Approach 2: Glusterfs bricks as git repo's

1. created a 2 brick distribute volume
2. inited git repo on brick1
3. inited git repo on brick2
4. created files, committed the relevant brick's git.
5. Modified files, and committed them again on brick's git
6. Did branch check-outs, to simulate versions @ point in time on individual 
bricks
7. reset branch heads, and was able access older version of files (after a 
stash).
8. Was able to create files/dirs/symlinks/hardlinks
9. Both NFS/FUSE clients were used.

Buoyed by this, will start prototyping integration of libgit2 as xlator for 
file version support.

There are 3 approaches to consider:

1. Load git xlator on clients volfiles
2. Load git xlator on server volfiles
3. Replace posix interface with git interface.

Please provide feedback, on what would be more desirable.
Very interesting! Option 2 makes most sense to me, the posix xlator
contains some access checks and such, which you probably should not need
to duplicate.

Have you thought about making the previous version accessible through
the glusterfs/nfs mount? Other vendors seem to have a .snapshot
directory with previous versions, would something like that be possible?
Users would be able to recover deleted files themselves that way.

Also, I do not know if git stores xattrs and their changes...

Cheers,
Niels

_______________________________________________
Gluster-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/gluster-devel




reply via email to

[Prev in Thread] Current Thread [Next in Thread]