Re: [Gluster-devel] Phasing out replace-brick for data migration in favo

Hello all,
DHT's remove-brick + rebalance has been enhanced in the last couple of releases to be quite sophisticated. It can handle graceful decommissioning of bricks, including open file descriptors and hard links.

Last set of patches for this should be reviewed and accepted before we make that claim :-) [ http://review.gluster.org/5891 ]

This in a way is a feature overlap with replace-brick's data migration functionality. Replace-brick's data migration is currently also used for planned decommissioning of a brick.

Reasons to remove replace-brick (or why remove-brick is better):

- There are two methods of moving data. It is confusing for the users and hard for developers to maintain.

- If server being replaced is a member of a replica set, neither remove-brick nor replace-brick data migration is necessary, because self-healing itself will recreate the data (replace-brick actually uses self-heal internally)

- In a non-replicated config if a server is getting replaced by a new one, add-brick <new> + remove-brick <old> "start" achieves the same goal as replace-brick <old> <new> "start".

Should we phase out CLI of doing a 'remove-brick' without any option too? because even if users do it by mistake, they would loose data. We should enforce 'start' and then 'commit' usage of remove-brick. Also if old method is required for anyone, they anyways have 'force' option.

- In a non-replicated config, <replace-brick> is NOT glitch free (applications witness ENOTCONN if they are accessing data) whereas add-brick <new> + remove-brick <old> is completely transparent.

+10 (thats the number of bugs open on these things :-)

- Replace brick strictly requires a server with enough free space to hold the data of the old brick, whereas remove-brick will evenly spread out the data of the bring being removed amongst the remaining servers.

- Replace-brick code is complex and messy (the real reason :p).

Wanted to see this reason as 1st point, but its ok as long as we mention about this. I too agree that its _hard_ to maintain that piece of code.

- No clear reason why replace-brick's data migration is better in any way to remove-brick's data migration.

One reason I heard when I sent the mail on gluster-devel earlier (http://lists.nongnu.org/archive/html/gluster-devel/2012-10/msg00050.html ) was that the remove-brick way was bit slower than that of replace-brick. Technical reason being remove-brick does DHT's readdir, where as replace-brick does the brick level readdir.

I plan to send out patches to remove all traces of replace-brick data migration code by 3.5 branch time.

Thanks for the initiative, let me know if you need help.

NOTE that replace-brick command itself will still exist, and you can replace on server with another in case a server dies. It is only the data migration functionality being phased out.

Yes, we need to be careful about this. We would need 'replace-brick' to phase out a dead brick. The other day, there was some discussion on have 'gluster peer replace <old-peer> <new-peer>' which would re-write all the vol files properly. But thats mostly for 3.6 time frame IMO.

Please do ask any questions / raise concerns at this stage :)

What is the window before you start sending out patches ?? I see http://review.gluster.org/6010 which I guess is not totally complete without phasing out pump xlator :-)

I personally am all in for this change, as it helps me to finish few more enhancements I am working on like 'discover()' changes etc...

Regards,

Amar

From:	Amar Tumballi
Subject:	Re: [Gluster-devel] Phasing out replace-brick for data migration in favor of remove-brick.
Date:	Fri, 27 Sep 2013 22:45:52 +0530