Re: [Gluster-devel] solutions for split brain situation

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] solutions for split brain situation

From:	Mark Mielke
Subject:	Re: [Gluster-devel] solutions for split brain situation
Date:	Wed, 16 Sep 2009 17:18:06 -0400
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.1) Gecko/20090814 Fedora/3.0-2.6.b3.fc11 Thunderbird/3.0b3

On 09/16/2009 05:45 AM, Gordan Bobic wrote:

It's not my project (I'm just a user of it), but having done my research, my 
conclusion is that there is nothing else available that is similar to 
GlusterFS. The world has waited a long time for this, and imperfect as it may 
be, I don't see anything else similar on the horizon.

GlusterFS is an implementation of something that has only been academically 
discussed elsewhere. And I haven't seen any evidence of any other similar 
things being implemented any time soon. But if you think you can do better, go 
for it. :-)

I came to a slightly different conclusion, but similar effect. Of theprojects available, GlusterFS is the closest to production *today*. Theworld has waited a long time for this. It is imperfect, but right nowit's still high on the list of solutions that can be used today and havepotential for tomorrow.

In case it is of any use to other, here is the list I had worked outbefore when doing my analysis:

- GlusterFS (http://gluster.com/community/index.php) - Verypromising shared nothing architecture, production ready softwaresupported commercially, based on FUSE (provides insulation from thekernel at a small performance cost). Simple configuration. Very cuteimplementation where each "brick" for a "cluster/replication" setup isjust a regular file system that can be accessed natively, so the data isalways safe and can be inspected using UNIX commands or backed up usingrsync. Most logic is client side, including replication, and they usefile system attributes to journal changes and "self-heal". But, veryrecently there has been some problems, possibly with how GlusterFS callsLinux, triggering a Linux problem that causes the system to freeze up abit. My own first test froze things up. The GlusterFS support peoplewant to find the problem and I will be working with them to see whetherthis can be resolved or not.

- Ceph (http://ceph.newdream.net/) - Very promising shared nothingarchitecture, that has kernel module support instead of FUSE (betterperformance) but not ready for production. They say they will stabilizeit by the end of 2009, but do not recommend using it for production evenat that time.

- PVFS (http://www.pvfs.org/) - Very promising architecture. Widelyused in production. V1 has a shared metadata server. V2 they arechanging to a shared nothing architecture. Has kernel module supportinstead of FUSE (better performance). However, PVFS does not providePOSIX guarantees. In particular, the do not implement advisory lockingthrough flock()/fcntl(). This means that use of this system wouldprobably require an architecture that does master/slave fail over asopposed to master/master fail over. Most file system accesses do notcare for this level of locking, but dovecot in particular probably does.The dovecot locking through .lock files might work, but I need to look alittle closer.

- Grid Datafarm (http://datafarm.apgrid.org/) - Designed as a userspace data sharing mechanism, however a FUSE module is available toprovide POSIX functionality on top.

- Lustre (http://www.lustre.org/) - Seems to be the focus of theCommercial world. Currently based on ext3/ext4, to be based on ZFS in2010.Weakness seems to be on having a single shared metadata server thatmust be highly available using a shared disk solution such as GFS orOCFS. Due to this architecture, I do not consider this solution to meetour requirements of a shared nothing architecture where any server cancompletely die, and the other server take over the load withoutintervention.

- MooseFS (http://www.moosefs.com/) - Alternative to Lustre. Stilluses a shared metadata server, and therefore does not meet requirements.

- XtreemFS (http://en.wikipedia.org/wiki/XtreemFS) - Very promisingarchitecture. However, current version uses single metadata server andwill only replicate content that is specifically marked as read only.Replicated metadata scheduled for 2010Q1. Read/write replicationscheduled for some time later.

- CRFS (http://oss.oracle.com/projects/crfs/) - Btrfs based - Btrfsis Oracle's answer to ZFS, and CRFS is Oracle's answer to Lustre,although development of this solution seems slow and this system is notready for production. Development for both have effectively stalledsince 2008. If these are ever released, I think they will be greatsolutions, but they are apparently having designs problems (eitherdevelopers who are not good enough, or the design is too complicated,probably both).

- TahoeFS (http://allmydata.org/trac/tahoe) - POSIX interface (viaFUSE) not ready for production.

- Coda (http://www.coda.cs.cmu.edu/) and Inter-Mezzo(http://en.wikipedia.org/wiki/InterMezzo_%28file_system%29) - Older"experimental" distributed file system still being maintained, but nodevelopment beyond bugfixes that I can see. They say the developers havemoved on to Lustre.

I am still having some problems with GlusterFS - I rebooted my machinesat the exact same time and all three came up frozen in the mount call.Now that I know how to clear the problem - ssh in with another window,and kill -9 the mount, it isn't so bad - but I can't take this toproduction unless this issue is resolved. I'll try to come up withbetter details.


Cheers,
mark

--
Mark Mielke<address@hidden>

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Gluster-devel] solutions for split brain situation, (continued)
- Re: [Gluster-devel] solutions for split brain situation, Mark Mielke, 2009/09/14
  - Re: [Gluster-devel] solutions for split brain situation, Stephan von Krawczynski, 2009/09/14
- RE: [Gluster-devel] solutions for split brain situation, Gordan Bobic, 2009/09/16
  - Re: [Gluster-devel] solutions for split brain situation, Mark Mielke <=
    - Re: [Gluster-devel] solutions for split brain situation, Joe Landman, 2009/09/16
    - Re: [Gluster-devel] solutions for split brain situation, Gordan Bobic, 2009/09/16
- RE: [Gluster-devel] solutions for split brain situation, Gordan Bobic, 2009/09/16
  - Re: [Gluster-devel] solutions for split brain situation, Stephan von Krawczynski, 2009/09/17
- RE: [Gluster-devel] solutions for split brain situation, Gordan Bobic, 2009/09/17
  - Re: [Gluster-devel] solutions for split brain situation, Stephan von Krawczynski, 2009/09/17
    - Re: [Gluster-devel] solutions for split brain situation, Mark Mielke, 2009/09/17
    - Re: [Gluster-devel] solutions for split brain situation, Anand Avati, 2009/09/17
    - Re: [Gluster-devel] solutions for split brain situation, Michael Cassaniti, 2009/09/17
    - Re: [Gluster-devel] solutions for split brain situation, Mark Mielke, 2009/09/18

Prev by Date: RE: [Gluster-devel] solutions for split brain situation
Next by Date: Re: [Gluster-devel] solutions for split brain situation
Previous by thread: RE: [Gluster-devel] solutions for split brain situation
Next by thread: Re: [Gluster-devel] solutions for split brain situation
Index(es):
- Date
- Thread