Re: [Gluster-devel] Performance Translators' Stability and Usefulness

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Performance Translators' Stability and Usefulness

From:	Shehjar Tikoo
Subject:	Re: [Gluster-devel] Performance Translators' Stability and Usefulness
Date:	Sun, 05 Jul 2009 13:10:58 +0530
User-agent:	Mozilla-Thunderbird 2.0.0.19 (X11/20090103)

Thanks Geoff.

It is always good to get an external opinion on where we stand.

Geoff Kassel wrote:

Hi Shehjar, I feel I should comment on part of your reply to Gordan's
 email.
Finally - which translators are deemed stable (no know issues -memory leaks/bloat, crashes, corruption, etc.)?
We can definitely vouch for a higher degree of stability of thereleases. Otherwise, I dont think there is any performancetranslator we can call completely stable/mature because of theroadmap we have for constantly upgrading algorithms, functionality,
 etc.
When will the Gluster team be able to deliver a stable, mature, andreliable version of GlusterFS?


Continuing from what I said earlier, the fact that GlusterFS releases
work in a stable manner is shown by the deployments among our
customers.

At the same time, are we satisfied with the experience of non-paying
users?
No. I accept there are bottlenecks in our processes. We
acknowledge that and have been working on fixing them. Most visible
aspect of that for users is the move to using bugzilla at
bugs.gluster.com. The earlier setup at Savannah just wasnt scaling.
Personally, in just a few weeks, I am finding handling bugs through
this portal much faster and streamlined than earlier.

I have been using GlusterFS since the v1.3.x days, and I have yet to
 see a version since then that doesn't crash at least once a day from
 just load on even the simplest configurations.
Then there's the data corruption bug of the early 2.0.0 releases,which has kept me (and no doubt others) from upgrading to thesereleases.
I have read about the Gluster QA team, but quite frankly, I have yet
 to see the fruits of this team's work. Letting through a bug of that
 magnitude in a major release blew a lot of trust I had in the
Gluster team's QA process.
When will regression tests be used? It's been months now since thisbug, and still I don't see any sign of the use of this simple,industry-standard technique to minimise the risk of such issuesslipping through again.


I think the QA folks have done some really good work in stabilizing
GlusterFS over the last year or so. The result is there to see in the
2.0.X releases.


Why wasn't this prioritised after such a disasterous bug?


It could've been for any number of reasons ranging from problems with
reproducing it, limited functionality for managing bug reports in
Savannah to even the general constraints of being a commercial
open-source project.

Still, I understand your problems are more important to you than the
problems being faced by other users, I'd so appreciate if you'd give our
bugzilla-based setup a chance at handling this bug. Or, let me
know if you've already filed a report.

When will this even show up on the roadmap?


The QA team is already working on just such a testing and
regression framework.

Thanks
Shehjar

Geoff.

On Sat, 4 Jul 2009, Shehjar Tikoo wrote:
Gordan Bobic wrote:
Just reading through the wiki on this and a few things areunclear, so I'm hoping someone can clarify.
1) readahead
- Is there any point in using this on systems where theinterconnect <= 1Gb/s? The wiki implies there is no point inthis, but doesn't quite state it explicitly.
I am pretty sure it helps. The question of using read-ahead is more
of a question related to the workload rather than theinterconnect, for eg. it'll be useful for sequential reading,without any doubts. Of course, there can be cases where excessiveread-ahead chokes the 100 Mib/s link, but then read-ahead can beconfigured to reduce its utilization of the network by reducing the
 page-count option.
- Is there any point in using this on a server that is also it's
 own client when use with replicate/afr? I'm guessing there isn't
 since the local fs will be doing it's own read-ahead but I'd
like some confirmation on that.
No. Generally, read-ahead will be most beneficial only on theclient side since it helps avoid the need to go to the network when
 an application does need the data already read-ahead. Yes, on the
 server side, on-disk file systems read-ahead already does it best.
In your setup above, in case the system has more than a fewCPUs/cores, it might be possible to get a little better performance
 while using io-threads on the client. That'll make it possible to
offload the read-ahead to an io-thread without blocking the mainglusterfs thread. Then, the benefit of read-ahead + io-threadsmight show up when the data is actually needed, and could be served
 without a kernel entry/exit for file system call.
2) io-threads

Is this (usefully) applicable on the client side?
It is. Using io-threads on the client side helps offload theprocessing of individual file operations onto a separate thread,freeing up the main thread to perform other tasks. This isespecially applicable when using io-threads under a write-behindand/or read-ahead translators where the write-behind and read-aheadrequests, i.e. background or asynchronous requests essentially,can be offloaded to the threads while freeing up the main glusterfs
 thread to handle sync requests, i.e. requests that could make the
 application block on a syscall.
Also, using io-threads on client side could help in performingnetwork IO in a separate thread, again freeing up the main threadfor other in-band tasks.
Then again, if the workload is not concurrent in terms of number of
 processes or number of files/dirs, then io-threads might not help
 much.
3) io-cache
The wiki page has the same paragraph pasted for both io-threadsand io-cache. Are they the same thing, or is this a documentation
 bug?
No, they're not the same. The documentation is still in a flux.Hope this version will help:http://www.gluster.org/docs/index.php/Translators_options
What does io-cache do?
io-cache is a translator that caches data from files so that future
references do not lead to network requests. It is generally usedalong with read-ahead so that the data that gets read ahead or anydata that gets read, for that matter, will be available from thelocal client cache. We're also working on incorporating support forwrite buffering in io-cache so that write operations can alsobenefit from local buffering until a point in time suitable foractual transmission to the server.
Finally - which translators are deemed stable (no know issues -memory leaks/bloat, crashes, corruption, etc.)?
We can definitely vouch for a higher degree of stability of thereleases. Otherwise, I dont think there is any performancetranslator we can call completely stable/mature because of theroadmap we have for constantly upgrading algorithms, functionality,
 etc.
Any particular suggestions on which performance translatorcombination would be good to apply for a shared root AFR over aWAN? I already have read-subvolume set to the local mirror, butany improvement is welcome when latencies soar to 100ms and b/wgets hammered down to 1-2.5 Mb/s.
WANs are generally characterised as having a large bandwidth-delayproduct. That basically means, for good throughput, we should bepipelining as much data as possible over the link, so that the longlatency overhead can be mitigated or amortised by sending largeramount of data for the same fixed overhead.
That said, what particular workload is it that gives you athroughput of 1-2.5 Mb/s?
When you say "latencies soar to 100ms", does that mean, these arejust unusual spikes or is that the normal latency observed?
It'd help to see your volfiles and how the performance translators
 are arranged.
Another thing - when a node works standalone in AFR, performance
 is pretty good, but as soon as a peer node joins, even though
the original node is the primary, performance degrades on theprimary node quite significantly, even though the interconnect is
 direct gigabit, which shouldn't be adding any particular latency
 (< 0.1ms) or overheads, especially on the primary node. Is there
any particular reason for this degradation? It's OK in normalusage, but some operations (e.g. building an big bootstrappinginitrd (50MB compressed, including all the gernel drivers) takesnearly 10x longer when the peers join than when the node isstandalone. I expected some degradation, but only on the order ofadded network latency, and this is way, way more. I tried withand without direct-io=off, and that didn't make a great amount ofdifference. Which performance translators are likely to helpwith this use case?
I think Vikas will be able to answer that better.

-Shehjar
Gordan
_______________________________________________ Gluster-develmailing list address@hiddenhttp://lists.nongnu.org/mailman/listinfo/gluster-devel
_______________________________________________ Gluster-develmailing list address@hiddenhttp://lists.nongnu.org/mailman/listinfo/gluster-devel

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] Performance Translators' Stability and Usefulness, Gordan Bobic, 2009/07/03
- Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Shehjar Tikoo, 2009/07/04
  - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Geoff Kassel, 2009/07/04
    - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Gordan Bobic, 2009/07/04
    - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Geoff Kassel, 2009/07/04
    - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Gordan Bobic, 2009/07/04
    - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Shehjar Tikoo <=
    - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Geoff Kassel, 2009/07/05
    - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Gordan Bobic, 2009/07/05
    - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Filipe Maia, 2009/07/05
    - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Geoff Kassel, 2009/07/06
    - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Michael Cassaniti, 2009/07/06
    - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Mickey Mazarick, 2009/07/06
    - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Geoff Kassel, 2009/07/06
    - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Anand Avati, 2009/07/06
    - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Geoff Kassel, 2009/07/07
    - Re: [Gluster-devel] Performance Translators' Stability and Usefulness, Jacques Mattheij, 2009/07/07

Prev by Date: Re: [Gluster-devel] Performance Translators' Stability and Usefulness
Next by Date: Re: [Gluster-devel] Сrash - 2.0.git-2009.06.1 6
Previous by thread: Re: [Gluster-devel] Performance Translators' Stability and Usefulness
Next by thread: Re: [Gluster-devel] Performance Translators' Stability and Usefulness
Index(es):
- Date
- Thread