gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Versioning


From: Fred van Zwieten
Subject: Re: [Gluster-devel] Versioning
Date: Thu, 2 Aug 2012 17:12:21 +0200

On Wed, Jul 25, 2012 at 11:20 PM, Fred van Zwieten <address@hidden> wrote:
"Now I am leaning towards git based versioning. Integrate git into
GlusterFS to track changes on specified events (timer, file-close,
dir-tree-modify..). We may not do this via translator interface, but
through the newly proposed simple event/timer interface. "

I am not sure I would like that. Our idea is to make the previous versions (read-only!) available to the end-users through a separate mount-point, taking file permissions into account. I am not sure if that is at all possible when they live inside a git repository.

(disclaimer: I do not know the inner workings of glusterfs nor translators) I would think making it part (of the receiving part) of geo-replicator translator would be ideal because it knows what is going on. If a file /a/b/c is updated it's previous version could be stored as /pre/a/b/c.<datetime> or /pre/<datetime>/a/b/c. If the previous versions live on the same file-system you could even play with inodes to keep only the previous versions of blocks. This would make it very space efficient (sort of file based snapshotting).

I do agree that using git makes it more modular and independent of the geo-replicator translator.

I am also curious how you would handle multiple writes in a short time to the same file without ending up with an equal amount of previous versions.

Also, I can't find the note you are referring to. Could you please make a feature wiki page using the template?

Fred

We broke GeoReplication into two parts: (1) Marker - change tracking translator and (2) a simple queue - query changes and invoke rsync with specific list of files over ssh.  Unlike inotify, marker framework keeps track of changes with in the filesystem as xtime in extended attributes. You can ask the filesystem to list all changed files and folders since a particular point in time. This way, external service can tolerate crash, WAN link failure, etc.  Marker allows developers to extend storage capabilities using simple application programming model (even scripting languages are OK).

If certain tasks can be achieved outside of a translator, it is good to do so. Just like kernel mode , translator mode has some limitations. Translator code has to be efficient, asynchronous (event driven), latency sensitive and free of memory leaks.  If we extend the marker framework idea into generic event hook mechanism, we can develop powerful storage applications outside of the translator mechanism. Say you register your tool or script for certain events. When the event occurs, your code gets invoked with necessary parameters. You could then operate on the mounted filesystem itself, just as any other application. For example, you register a git script for invocation on a event say "when ever a registered directory tree is modified and time elapsed more than 30 mins". All this script does is, push changes to external origin. It is crude and simple, but achieves the goal.  Simple is better. You may also develop anti-virus plugins or silent data corruption checks using this technique. Users can use simple git checkout for flip views. Because git doesn't scale for large content, you can limit users to explicitly register interested folders for versioning. If you want to create a mountable of remote content, you can write a translator to  trap chdir or lookup for a directories named after timestamp and perform git checkout. If I use git for continuous automated file system versioning,  I will suggest users to use git tool itself as the UI.

I am just giving you tips and suggestions. Don't limit your ideas any way.

If I am guessing your idea correctly, it will have few limitations, but can live with it.

 * Only files are versioned. Directories are not.
 * File renames and Directory renames (mv) are not supported.
 * Every version is a complete duplicate copy (not as COW or WAFL).
 * Changes are tracked at per file level. Changes across a directory tree are not grouped. I mean cvs style, not like git as a patch set.

It is actually OK to make duplicate copies of changed files. In reality, for most practical use cases, very few files across the name space gets modified. Most of the files are written once and rarely modified. Files older than 30 days are hardly accessed. So it is OK to store duplicate copies of just the changed files. btrfs or device-mapper dedup may may take care of this as well. I won't worry too much about duplicating data, given its very small proportion. 
I didn't quite understand how you can play with inodes to avoid this duplication. Did you mean btrfs dedup like capability?.

If you want to avoid these limitations, think about rdiff-backup style continuous automated backup. Just like georep, you monitor the filesystem for changes and backup on a continuous basis. It is OK to give users a tool or API to restore/view older versions. This is much simpler to implement than WAFL or COW style storage format and file level snapshoting. 


Anand,

These are all "design" decisions that we do not need and even make the product less usefull in our use-case.

We have a large archive of tiff files. Every tiff file is large (50+ mb). The images themselves do not get modified, but their EXIF metadata does. There are also file renames and they get re-arranged into different directory structures. For this archive we need scalable filesystem with georep to second location _and_ file versioning.


"Because git doesn't scale for large content, you can limit users to explicitly register interested folders for versioning"

Now, it seems to me git does not fit this bill, because it doesn't scale very well.


"* File renames and Directory renames (mv) are not supported"

If you mean building up retention on file renames and moves i agree for our use-case, but other might need it. Look at backuppc for a cool solution on that.

"* Every version is a complete duplicate copy (not as COW or WAFL)."

The fact that each version is a complete duplicate is not very storage friendly, because in out use-case only the EXIF metadata changes. I seek rdiff-backup like functionality there.

"It is actually OK to make duplicate copies of changed files. In reality, for most practical use cases, very few files across the name space gets modified. Most of the files are written once and rarely modified. Files older than 30 days are hardly accessed. So it is OK to store duplicate copies of just the changed files. btrfs or device-mapper dedup may may take care of this as well. I won't worry too much about duplicating data, given its very small proportion. "

I do not agree with you. If you say most of the files are written once and rarely modifed you are narrowing the usecase for glusterfs. You are describing near worm. Out use-case is not that. Also, our files also get modified after 30 days. Relying on dedup on the lower fs level is also not good. Suppose you have a 200TB filesystem. That would take post-proces dedup a very long time to find the dups. Better to do it inline. Again, look a backuppc for an implementation example.

Fred

reply via email to

[Prev in Thread] Current Thread [Next in Thread]