monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Getting rid of DieDieDie merge


From: William Uther
Subject: [Monotone-devel] Getting rid of DieDieDie merge
Date: Wed, 8 Aug 2007 23:54:47 +1000


On 08/08/2007, at 11:38 AM, Nathaniel Smith wrote:


On Tue, Aug 07, 2007 at 10:17:57AM +1000, Brian May wrote:
I really don't like this. In means if somebody accidently or
deliberately deletes a file that shouldn't have been deleted, commits
the result in the same branch, and then propagates the change, data
may be lost. Consider for example if other people are working on the
files which get deleted. When revisions are merged, the changes will
be lost.

Don't worry, *no-one* likes it.  I absolutely hate it, and I'm the
one who built it this way.  It's just work to fix, because somehow you
have to track when deletes happened, *but* you need to do so in a way
that doesn't cause monotone's per-tree metadata to grow without bound
when people are doing repeated add/delete cycles.  (I.e., there are
workflows -- I think OpenEmbedded does this, actually -- where the
tree never gets bigger, but the number of deleted files grows without
bound, and we don't want to make common operations get arbitrarily
expensive because of this, just to support the important-but-rare
case of merging deletes and resurrections.)

Hi all,

Was just looking at this. I'm still trying to understand exactly what is going on here. I'm basing my understanding on this irc conversation: http://colabti.de/irclogger/irclogger_log/monotone? date=2007-03-20,Tue#l239

and what I'm reading in the source (although I don't quite 'get' it all yet - hence this email).

Background:
  - We want to use mark-merge on file existence.
- At the moment the 'marks' for mark-merge are generated in one memoized set of recursive calls (database::get_roster_version()), and cached in a single global roster_cache. I assume the mark-merge marks are what is stored in the marking_map(s). - Each time a new roster delta is applied (see roster_delta_t::apply()), then the markings for any deleted files are deleted. This happens because otherwise the set of markings would grow without bound. - Unfortunately, this means that later when the markings are being used for mark-merge (in roster_merge()), we can only use DieDieDie merge (and even then we still do an extra little lifetime check...)

Possible solution:

We don't want to keep around all node_ids because that could grow without bound. However, my reading of mark-merge is that if the value we're merging is equal in two nodes we're merging, then it will hold that value in the child. So if a file has been deleted on both sides of a merge, it will be deleted in the child. This means we only need to bother to record the marks of files that exist on at least one side of the merge.

From here we get a somewhat simple algorithm: As we're going through building up our mark data structures, pass in a set of node_ids that are in the final revisions being merged. When we're in roster_delta_t::apply(), don't delete the markings of nodes in that set.

This has a few nice properties: i) it means that the set of nodes we're recording data for will not grow without bound. ii) we'll have the data we need when we need it.

Now, I haven't figured out all the details here. It seems that get_roster() is both building up the list of files and building up the marking information. A simple change would be to make two passes - the first to get the list of files and the second to add the mark information for the few that have been deleted, and exist in another revision. (I'm imagining having one roster_cache for each set of files for which we need to force tracking - if we can work out the node_ids of all files in all revisions we're merging, then this is just two roster_caches and two passes for any number of merges).

There might be a way to make this more efficient with some thought...

Do those who know the mtn code think I'm on the right track?

Will        :-}






reply via email to

[Prev in Thread] Current Thread [Next in Thread]