Re: [Gluster-devel] Performance Translators' Stability and Usefulness

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Performance Translators' Stability and Usefulness

From:	Anand Babu Periasamy
Subject:	Re: [Gluster-devel] Performance Translators' Stability and Usefulness
Date:	Mon, 06 Jul 2009 01:49:15 -0700
User-agent:	Mozilla-Thunderbird 2.0.0.22 (X11/20090701)

Gordon, Geoff, Fillipe,

We are sorry!. We admit we had a rough and difficult past.

Here are the reasons, why it was difficult for us:
* Limited staff and QA environment.

* GlusterFS is a programmable file system. It supported many OS distros, applications,hardware and storage architecture. It was impossible to QA all possible combinations. Whatwe declared as stable is just one of many such use-cases.

* Poor documentation.

We are now VC funded. We have increased the size of our team and hardware labsignificantly. 2.0 is an outcome of this investment. 2.0.3 scheduled for this week will berelatively lot more stable. A dedicated technical writer is now working on an improvedversion of our installation guide. We are going to templatize GlusterFS stableconfigurations through a tool for generating and managing volume spec files. GlusterSP(storage platform) will completely automate the installation and management of aruggedized release of GlusterFS in an embedded OS form. GlusterSP 2010 first beta will beout in 2 months. With its web based UI and pre-configured system image, a number of errorfactors are reduced.

We are constantly learning and improving. You are making a valuable contribution byconstructively criticizing us with details and proposals. We take them seriously andpositively.


Happy Hacking,
--
Anand Babu Periasamy
GPG Key ID: 0x62E15A31
Blog [http://unlocksmith.org]
GlusterFS [http://www.gluster.org]
GNU/Linux [http://www.gnu.org]


Geoff Kassel wrote:

Hi Gordan,
What is production unready (more than Gluster) about PeerFS or SeznamFS?
Well, I'm mostly going by your email comparing these of a few months ago. Yourneeds are not that dissimilar to mine.
I see on the project page for SeznamFS now that there's apparently support forSeznamFS to do master-master replication 'MySQL' style - with the limitationsof MySQL's master-master replication, apparently.
However, I can't seem to find out exactly what those limitations entail - orhow to set it up in this mode. (And I am looking for a system that wouldallow more than two masters/peers, which is why I passed over DRBD forGlusterFS originally.)
I can't get even the PeerFS web page to load. That's a disturbing sign to me.
You can fail over NFS servers. If the servers themselves are mirrored
(DRBD) and/or have a shared file system NFS should be able to handle the
IP being migrated between servers. I've found it this tends to work
better with NFS over UDP provided you have a network that doesn't
normally suffer packet loss.
Sorry, thought you were talking about NFS exports from just one localdrive/RAID array.
My leading fallback option for when I give up on Gluster is pretty muchexactly what you've just described. However - I have the same (potential)issue as you with DRBD and WANs looming over my project i.e. the eventualneed to run masters/peers in geographically distributed sites.
How do you mean? GFS1 has been in the vanilla kernel for a while.
I don't use a vanilla kernel. I use a 'hardened' kernel patched with PaX and afew other security systems, to protect against stack smashing attacks andother nasties. (Just a little bit of extra, relative security, to makewould-be attackers go after softer targets.)
PaX is especially intolerant of memory faults in general, which is where myefforts in patching GlusterFS were focused. (And yes, I have disabled PaXfeatures for Gluster. No, it didn't improve anything.)
When I was looking into GFS, I found that the GFS patches (perhaps I waslooking at v2) didn't work with the hardened patchset. GlusterFS had morepromise than GFS anyway, so I went with GlusterFS.
An older version of GlusterFS - as buggy as it is for me - is
unfortunately still the best option.
Out of interest, what was the last version of Gluster did you deem
completely stable?
What works for me with only (only!) a few crashes a day, and no apparent datacorruption is 1.4.0tla849. TLA 636 worked a little better for me - onlyrandom crashes once in a while. (But again - backwards incompatible changeshad crept in between the two versions, so I couldn't go back.)
I had much better stability with the earlier 1.3 releases. I can't rememberexactly which ones now. (I suspect it was 1.3.3, but I'm no longer sure.)It's been quite a while.
I don't agree on that particular point, since the last outstanding bug
I'm seeing with any significant frequency in my use case is the one of
having to wait for a few seconds for the FS to settle after mounting
before doing anything or the operation fails. And to top it off, I've
just had it succeed without the wait. That seems quite heisenbuggy/recey
to me. :)
Sorry, I was talking about the data corruption bugs. Not your first-accessissue.
That doesn't help - the first-access-settle-time bug has been around for
a very long time. ;)
Indeed.
It's my hope that once testing frameworks (and syslog logging, in your case)are made available to the community, people like us can attempt to debug oursystems with some degree of confidence that we're not causing other subtleissues with our patches.
That's got to be better for the project as a whole.

Geoff.

On Sun, 5 Jul 2009, Gordan Bobic wrote:
Geoff Kassel wrote:
Sounds like a lot of effort and micro-downtime compared to a migration
to something else. Have you explored other options like PeerFS, GFS and
SeznamFS? Or NFS exports with failover rather than Gluster clients, with
Gluster only server-to-server?
These options are not production ready (as I believe has been pointed out
already to the list) for what I need;
What is production unready (more than Gluster) about PeerFS or SeznamFS?
or in the case of NFS, defeating the
point of redundancy in the first place.
You can fail over NFS servers. If the servers themselves are mirrored
(DRBD) and/or have a shared file system NFS should be able to handle the
IP being migrated between servers. I've found it this tends to work
better with NFS over UDP provided you have a network that doesn't
normally suffer packet loss.
(Also, GFS is also not compatible
with the kernel patchset I need to use.)
How do you mean? GFS1 has been in the vanilla kernel for a while.
I have tried AFR on the server side and the client side. Both display
similar issues.

An older version of GlusterFS - as buggy as it is for me - is
unfortunately still the best option.
Out of interest, what was the last version of Gluster did you deem
completely stable?
(That doesn't mean I can't complain about the lack of progress towards
stability and reliability, though :)
Heh - and would you believe I just rebooted one of my root-on-glusterfs
nodes and it came up OK without the bail-out requiring manual
intervention caused by the bug that causes first access after mounting
to fail before things have settled.
One of the problems is that some tests in this case are impossible to
carry out without having multiple nodes up and running, as a number of
bugs have been arising in cases where nodes join/leave or cause race
conditions. It would require a distributed test harness which would be
difficult to implement so that they run on any client that builds the
binaries. Just because the test harness doesn't ship with the sources
doesn't mean it doesn't exist on a test rig the developers use
Okay, so what about the volume of test cases that can be tested without a
distributed test harness? I don't see any sign of testing mechanisms for
that.
That point is hard to argue against. :)
And wouldn't it be prudent anyway - giving how often the GlusterFS devs
do not have access to the platform with the reported problem - to provide
this harness so that people can generate the appropriate test results the
devs need for themselves? (Giving a complete stranger from overseas root
access is a legal minefield to those who have to work with data held
in-confidence.)
Indeed. And shifting test-case VM images tends to be impractical (even
though I have provided both to the gluster developers in the past for
specific error-case analysis).
It's been my impression, though, that the relevant bugs are not
heisenbugs or race conditions.
I don't agree on that particular point, since the last outstanding bug
I'm seeing with any significant frequency in my use case is the one of
having to wait for a few seconds for the FS to settle after mounting
before doing anything or the operation fails. And to top it off, I've
just had it succeed without the wait. That seems quite heisenbuggy/recey
to me. :)
(I'm judging that on the speed of the follow up patch, by the way - race
conditions notoriously can take a long time to track down.)
That doesn't help - the first-access-settle-time bug has been around for
a very long time. ;)

Gordan


_______________________________________________
Gluster-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/gluster-devel

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Gluster-devel] Performance Translators' Stability and Usefulness - Regression test outline, (continued)
- Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Geoff Kassel, 2009/07/04
  - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Gordan Bobic, 2009/07/04
    - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Geoff Kassel, 2009/07/05
    - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Anand Babu Periasamy <=
    - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Geoff Kassel, 2009/07/06
    - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Alpha Electronics, 2009/07/06
    - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Geoff Kassel, 2009/07/07
- Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Vikas Gorur, 2009/07/06
  - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Gordan Bobic, 2009/07/06

Prev by Date: Re: [Gluster-devel] Performance Translators' Stability and Usefulness
Next by Date: Re: [Gluster-devel] Performance Translators' Stability and Usefulness
Previous by thread: Re: [Gluster-devel] Performance Translators' Stability and Usefulness
Next by thread: Re: [Gluster-devel] Performance Translators' Stability and Usefulness
Index(es):
- Date
- Thread