gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] solutions for split brain situation


From: Michael Cassaniti
Subject: Re: [Gluster-devel] solutions for split brain situation
Date: Tue, 15 Sep 2009 10:25:59 +1000

2009/9/15 Stephan von Krawczynski <address@hidden>
On Mon, 14 Sep 2009 21:20:49 +0200
"Steve" <address@hidden> wrote:

>
> -------- Original-Nachricht --------
> > Datum: Mon, 14 Sep 2009 21:14:32 +0200
> > Von: Stephan von Krawczynski <address@hidden>
> > An: Anand Avati <address@hidden>
> > CC: address@hidden
> > Betreff: Re: [Gluster-devel] solutions for split brain situation
>
> > On Mon, 14 Sep 2009 21:44:12 +0530
> > Anand Avati <address@hidden> wrote:
> >
> > > > Our "split brain" is no real split brain and looks like this: Logfiles
> > are
> > > > written every 5 mins. If you add a secondary server that has 14 days
> > old
> > > > logfiles on it you notice that about half of your data vanishes while
> > not
> > > > successful self heal is performed, because the old logfiles read from
> > the
> > > > secondary server overwrite the new logfiles on your primary while new
> > data is
> > > > added to them.
> > >
> > > Have you been using favorite-child option?
> >
> > No, the option was not used.
> >
> > > Auto resolving of
> > > split-brain is bound to make you lose data of one of the subvolumes.
> > > If you had indeed specified favorite-child option, and the
> > > favorite-child option happens to be the server which had 14day old
> > > logs, what just happened was exactly what was in the elaborate warning
> > > log.
> > >
> > > Now what is more interesting for me is, the sequence of taking down
> > > and bringing up the servers you followed to split brain? Was is really
> > > just taking one server (any of them) down and bringing it back up? Did
> > > you face a split brain with just this? Can you please describe the
> > > minimal steps necessary to reproduce your issue?
> >
> > Take 2 servers and one client. Use a minimal replicate setup but do _not_
> > add
> > the second server. Copy some data on the first server via glusterfs, then
> > rsync that data on the second server directly from the first server
> > (glusterfsd not yet active there). Now change some of the data to have
> > files
> > that are really newer as your rsync cycle. Then start glusterfsd on the
> > second
> > server. Your client will add it. Then open the newer files r/w on the
> > client.
> > You will notice the split brain messages in the client logs and find that
> > every
> > other file gets indeed read in from the second (outdated) server fileset.
> > Write it back and your newer files on the first server are gone.
> > As said, no favorite child option set.
> >
> You just rsynced but did you synced the extended attributes as well?

No, we explicitly did not sync the extended attributes. But your question
should be placed more general: if I have a working glusterfs server, must all
data be backuped including extended attributes?
Why should it be lethal not to backup them, when I can get data online by
simply starting to export it via glusterfsd that has not been touched by
glusterfsd before? (think of a first-time export, you have some data and
install glusterfs for the very first time. Your data is of course exported
without any troubles. Where is the difference to a rsync backup with no
extended attributes?
 
Can we all read this in relation to extended attributes and the cluster/replicate translator. Understanding AFR translator
Also, if you want more reliable restores directly to a single storage brick (rather than restoring onto a replicate translator) I would suggest you have a backup system that handles extended attributes. I am using bacula for this purpose, but you may find other solutions that fit.

Regards,
Michael Cassaniti

reply via email to

[Prev in Thread] Current Thread [Next in Thread]