monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: cherrypicking in monotone


From: William Uther
Subject: [Monotone-devel] Re: cherrypicking in monotone
Date: Mon, 11 Dec 2006 19:36:07 +1100


On 11/12/2006, at 1:21 PM, Nathaniel J. Smith wrote:
On Mon, Dec 11, 2006 at 12:02:20PM +1100, William Uther wrote:
Cherry-picking for monotone:

Assume we're cherry-picking the changes in B-C and merging them into
revision D.

a) Find the ancestor you'd use for three way merge, call it A.
b) Pluck B-C and make a new branch with this change from A.  Call the
new revision C'.  If there are any conflicts then abort and tell the
user what conflicted - the conflicting revision is a dependency on
what they're trying to pull across and should probably be pulled
across too.
c) either:
     i) Modify C so it has C' as a parent, but otherwise no change.
     ii) Merge C' and C to make C''.  C'' should be identical to C.
Auto-resolve all conflicts in favour of C (there shouldn't be any,
but there may be if you didn't abort in b).  Any descendants of C
should then be merged with C'' - this should always be a clean
merge.  I am wary of doing this (merging descendants with C'')
automatically though.

Especially because there may be descendents that you do not know about
at the time you do the cherrypick, so you have to be prepared to
handle there be unmerged descendents at arbitrary later times anyway.

Hadn't thought too much about it because it just Didn't Feel Rightâ„¢, but yes, that's a good reason not to. :)

d) Merge C' with D and allow the user to resolve conflicts.

The end result is something that looks like http://venge.net/ monotone/
wiki/DaggyFixes .  But done post-hoc.  And it records the history
correctly so that you shouldn't get future conflicts (I think).

Probably should sit down and think about this some... I'm not quite
sure you avoid future conflicts, the way *-merge happens to work.

Hrm. I guess I was thinking about Darcs-style merging, not *-merge. I still don't have a good feel for *-merge. Actually, I'll ask a few questions while I'm here:

Does the merge in Monotone happen file-by-file or tree-by-tree. i.e. is the marking in *-merge on a file, or on a revision? If file-by- file, how is file identity tracked?

I'm guessing something like: trace back through the DAG from each node being merged (back through renames and copies) to the node where the file is created. And each file is only created in one place.

In
particular, suppose we start with:

      a1
     /  \
    /    \
   a2     a3
          |
          b1

and we want to cherrypick the a3->b1 change to a2.

Well, in this case you just merge b1 and a2. In fact, it should be possible to detect when the system should do a normal merge instead of this cherry-pick.

Following this
procedure, we get:

      a1
     /| \
    / |  \
   a2 b2  a3
   | /|   |
   |/ |   b1
  b3   \ /
        b4

b2 is created by plucking b1, b3 and b4 are created by clean merges.
Now, the point of recording history in a cherrypick (from the merge
algorithm point of view, anyway, there are other, perhaps more
important reasons from workflow point of view) is that some future
merges that would have been conflicts if you merely plucked, will
become clean merges if you truly cherrypicked.  More concretely,
suppose we check out b3, change our scalar to 'c', and commit:

      a1
     /| \
    / |  \
   a2 b2  a3
   | /|   |
   |/ |   b1
  b3   \ /
   |    b4
  c1

This is the key point: merging c1 and b4 now should be a clean merge
to c, because we "know" that the b in b4 is "the same" as the b in b3,
that c1 overrode.

But, in fact, with (unique|multi)-*-merge, this is a conflict:

      a1*
     /| \
    / |  \
   a2 b2* a3
   | /|   |
   |/ |   b1*
  b3   \ /
   |    b4 <-- b4 has a * for unique-*-merge, not for multi-*-merge
  c1*

*(c1) = c1, *(b4) = b4 for unique-*-merge, {b1, b2} for
multi-*-merge.  The ultimate problem in either case is the * on the
b1.  *-merge quite reasonably observes that while c1 has seen and
defeated the edit made in b2, the edits made in c1 and b1 have never
seen each other, so to be conservative, a conflict should be raised.

We've in the past discussed other ways to design scalar merge
algorithms to avoid this kind of problem, e.g. Bram's "convergent
scalar merge" might behave differently here.  But it does seem clear
from this example that if you want cherrypicking to work in the strong
sense, using this post-hoc technique, then ultimately you have to find
something in *-merge's user model that you think is wrong.

Hrm. That's interesting (he said with extreme understatement wondering how to patch up the fine idea that lay bloodied and groaning at his feet).

If we take a flight of fancy at the moment and imagine that we really could change the parentage of a node (as in c.i above). Then the result would be something like this:

      a1*
     /| \
    / |  \
   a2 b2* a3
   | /  \ |
   |/    b4
  b3
   |
  c1*

(I think)

Note that the * is gone from b1/b4 because we're pretending that a1- b2 happened first.

And then c1* should beat b4.  No?

This is really just looking to see what would happen if you'd done a DAGgy fix from the start, and trying to move the DAG into that state post-hoc. It's not the most elegant setup I've ever seen. sigh.

Hrm, I can think of another kludge too (and a kludge is just what you wanted, I know); can you make *-merge label b1 and b2 the same so that we get something that looks like this:

      a1*
     /| \
    / |  \
   a2 b1* a3
   | /|   |
   |/ |   b1*
  b3   \ /
   |    b4 <-- b4 has a * for unique-*-merge, not for multi-*-merge
  c1*

Now c1* has beaten b1*, so we're fine :).

N.B. labelling the two b1's the same is not too bad (he says waving hands wildly). One was machine generated from the other without conflicts.

I don't know what the story looks like for textual merging.

Well, for darcs-style merge, things should be ok.

I have
like zero good analytical tools to understand how sequences can and
should behave under merging.

I like the Operational Transformation theory. But it seems that is very similar to Darcs - just described in terms of exact three way merge, rather than patch commutation.

You can also get into interesting issues wrt file identity and such
here, since monotone currently doesn't either suturing or any kind of
revival (so in particular, there's no way to write down that a file
created by a pluck is "the same file" as exists on some other parallel
branch).  People run into this problem in practice, because they want
to pluck over a patch that creates a file, then later they want to
pluck over some edit to that file, but monotone gets confused and
can't tell which file to apply the edit to...

hrm. This is one part of the Darcs exact-three-way-merge and the Operational Transformation setup. When you pass a patch through a rename, the name of the patch is changed.

I guess my cherry-pick proposal also has this problem... Hrm. Assuming you are discovering file identity by tracing backwards through the revision graph, and you choose option i) where the plucked branch is made a parent of the original node, then this should actually work. The identity of the file would become the 'plucked' creation, not the other one.

I just don't like this concept of changing the history of a node though... Have you think more about that.

a) I haven't traced much of monotone's three-way-merge yet.  I'm
guessing noone has thought about this, but does anyone know how
monotone's merge behaves relative to property TP2 of http://
revctrl.org/OperationalTransformation ?  (I was going to have a think
about it, but I thought I'd ask first :)

I've just thought about it for about 10 minutes, so I'm not sure, but
I think

1) TP2 translated into merge terms is just associativity:
    merge(A, merge(B, C)) = merge(merge(A, B), C)
   or in graphical terms:
     A   B   C    A   B   C
      \ /   /      \   \ /
       D   /        \   D'
        \ /          \ /
         F     =      F'

You're merging revisions there. The theory is in terms of patches. But yes, it looks like together TP1 and TP2 are a strong form of commutivity and associativity. The important point here is that Ressel's Transformation Functions (see http://revctrl.org/ OperationalTransformation) look like a form of three-way merge. They have problems with the ordering being consistent when there is a conflict. I think this may be where Darcs' exponential case comes in...?

It seems that all the interesting cases have to do with conflicts...

2) (unique|multi)-*-merge have this property whenever D and D' both
   exist as clean merges, AND if one ordering has a conflict in it,
   then the other does as well.  I haven't written out the dotted-Is
   and crossed-Ts proof of this, so I could be wrong, but I think it
   follows pretty straightforwardly if you look at what the mark sets
   have to be and the lemmas in the original *-merge writeups.

Yep - that looks right.

b) As far as implementing cherry-picking, it seems that the least
common ancestor for merge is found about line 80 of merge.cc.  Is
that the right place to be looking?

The actual common ancestor is found by content_merge_database_adaptor
in its constructor, by calling find_common_ancestor_for_merge.  Note
that this function's contract is just "find a good common ancestor for
3-way textual merging".  ATM this means, "if there is an LCA, return
that, otherwise, recursively call yourself on the set of minimal
CAs".  But we don't know that this is the best thing to do for textual
merging (though it's the best we know of), and you might want to think
whether being a good base for 3-way textual merge and trying to avoid
criss-cross issues is the same as being a good place to move a
daggyfix to.

Heh. Interesting. The darcs solution as the 'base revision' was described by Ganesh Sittampalam recently in the revctrl thread:

Ganesh Sittampalam wrote:
> William Uther wrote:
[snip previous reference to http://hsenag.livejournal.com/9648.html ]
>> I agree with
>> your assessment that Darcs makes that 'intersection of all
>> patches' ancestor.  That seems one reasonable solution.  Why do
>> you think it is the 'correct' thing to do?  Is there a more formal
>> justification for this somewhere?
>
> If you want to merge repositories X and Y, and you need some common
> ancestor A against which to do it, you (a) want X/A and Y/A
> (where / = 'difference') to not contain any changes that are in the
> other, or there would be an unnecessary conflict, but (b) you do
> need any change that is in A to be a subset of the changes in each
> of X and Y, or A cannot be an ancestor in any meaningful sense.
>
> (a) implies that A is a superset of X /\ Y (where /\ =
> 'intersection') and (b) implies that A is a subset of X /\ Y. So A
> must be X /\ Y.

I believe this ends up being the same as Git's recursive-three-way- merge (except that in Darcs the onflict markers are not just textual so they never get confused). Is this what you mean by saying 'otherwise, recursively call yourself on the set of minimal CAs' - call merge on them to merge them together? Or did you mean recursively call the least common ancestor algorithm to head further back into ancestry?

Cheers,

Will         :-}

--
Dr William Uther                           National ICT Australia
Phone: +61 2 8306 0424               Computer Science and Engineering
Email: address@hidden      University of New South Wales
Email: address@hidden                  Sydney, Australia

Web: http://www.cse.unsw.edu.au/~willu/   or  http://www.nicta.com.au/

NICTA email Disclaimer:
http://www.cse.unsw.edu.au/~willu/NICTAEmailDisclaimer.html
UNSW email Disclaimer:
http://www.eng.unsw.edu.au/emaildis.htm






reply via email to

[Prev in Thread] Current Thread [Next in Thread]