Re: [Gluster-devel] self-heal behavior

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] self-heal behavior

From:	Gerry Reno
Subject:	Re: [Gluster-devel] self-heal behavior
Date:	Wed, 04 Jul 2007 10:24:45 -0400
User-agent:	Thunderbird 1.5.0.12 (X11/20070530)

Avati,
 Comments inline...

Anand Avati wrote:

Gerry,
your question is appropriate, but the answer to 'when to resync' isnot very simple. when a brick which was brought down is brought uplater, it may be a completely new (empty) brick. In that case startingto sync every file would most likely be the wrong decision. (we shouldrather sync the file which the user needs than some unused file). Evenif we chose to sync files without user accessing them it would be verysluggish too since it would be intervening in other operations.

Self-heal should start immediately to sync files but not at full speedbut rather at some throttled nice level that would not impact operations.

The current approach is to sync files on the next open() on it. Thisis usually a good balance since, during open() if we were to sync afile, even if it was a GB it would take 10-15 secs, and for normalfiles (in the order of few MBs) it is almost not noticable. But ifthis were to happen together for all files whether the user accessedthem or not there would be a lot of traffic and be very sluggish.

Again this should be done at a throttled level if there were otheroperations happening, if not then throttle it up.

This approach of syncing on open() is what even other filesystemswhich support redundancy do.
Detecting 'idle time' and beginning sync-up and pausing the sync-upwhen user begins activity is a very tricky job, but that is definitelywhat we aim at finally. It is not enough if AFR detects the client isfree, because the servers may be busy serving files to another clientand syncing at that time may not be the most apprpriate time. Thefollowing versions of AFR will have more options to tune 'when' tosync. Currently it is only at open(). We plan to add options to makeit sync on lookup() (happens on ls). Later versions would havepro-active syncing (detecting that both server and clients are idle etc).

That will be great.

Gerry


thanks,
avati

2007/7/4, Gerry Reno <address@hidden <mailto:address@hidden>>:

      I've been doing some testing of self-heal.  Basically taking
    down one
    brick and then copying some files to one of the client mounts, then
    bringing the downed brick back up.  What I see is that when I
    bring the
    downed brick back up, no activity occurs.  It's only when I start
    doing
    something in one of the client mounts that something occurs to rebuild
    the out-of-sync brick.  My concern with this is that if I have four
    applications on different client nodes (separate machines) using the
    same data set (mounted on GlusterFS).  The brick on one of these nodes
    is out-of-sync, and it is not until some user is trying to use the
    application that the brick starts to resync.  This results in
    sluggish
    performance to the user as all the data has to be brought over the
    network from other bricks since the local brick is out-of-sync.  Now
    there may have been ten minutes of idle time prior to this user trying
    to access the data but glusterfs did not make any use of this time to
    rebuild the out-of-sync brick but rather waited until a user tried to
    access data.  To me, it appears that glusterfs should be making use of
    such opportunity and this would diminish the overall impact to
    users of
    the out-of-sync condition.

    Regards,
    Gerry



    _______________________________________________
    Gluster-devel mailing list
    address@hidden <mailto:address@hidden>
    http://lists.nongnu.org/mailman/listinfo/gluster-devel




--

Anand V. Avati

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] self-heal behavior, Gerry Reno, 2007/07/04
- Re: [Gluster-devel] self-heal behavior, Anand Avati, 2007/07/04
  - Re: [Gluster-devel] self-heal behavior, Steffen Grunewald, 2007/07/04
  - Re: [Gluster-devel] self-heal behavior, Gerry Reno <=
  - Re: [Gluster-devel] self-heal behavior, DeeDee Park, 2007/07/09

Prev by Date: Re: [Gluster-devel] Problem with more than 65000 files
Next by Date: Re: [Gluster-devel] Only one "NFS" export in 1.3.0-pre5.1?
Previous by thread: Re: [Gluster-devel] self-heal behavior
Next by thread: Re: [Gluster-devel] self-heal behavior
Index(es):
- Date
- Thread