monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] cvs import


From: Michael Haggerty
Subject: Re: [Monotone-devel] cvs import
Date: Thu, 14 Sep 2006 12:51:19 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.5) Gecko/20060728 Thunderbird/1.5.0.5 Mnenhy/0.7.4.666

Markus Schiltknecht wrote:
> Michael Haggerty wrote:
>> The main problem with converting CVS repositories is its unreliable
>> timestamps.  Sometimes they are off by a few minutes; that would be no
>> problem for your algorithm.  But they might be off by hours (maybe a
>> timezone was set incorrectly), and it is not unusual to have a server
>> with a bad battery that resets its time to Jan 1 1970 after each reboot
>> for a while before somebody notices it.  Timestamps that are too far in
>> the future are probably rarer, but also occur.  CVS timestamps are
>> simply not to be trusted.
>>
>> The best hope to correcting timestamp problems is pooling information
>> across files.  For example, you might have the following case:
>>
>>   1   2
>>   |   |
>>   A   Z
>>   |
>>   B
>>   :
>>   Y
>>   |
>>   Z
>>
>> where A..Y have correct timestamps but Z has an incorrect timestamp far
>> in the past.  It is clear from the dependency graph that Z was committed
>> after Y, and by implication revision Z of file 2 was committed at the
>> same time.  But your algorithm would grab revision Z of file 2 first,
>> even before revision A of file 1.
> 
> But you could use another method to determine what to commit first. One
> which takes only dependency graph into account.
> 
> The simplest variant would be:
> 
> 1. randomly choose a commit (or take the one with the lowest timestamp
>    for a mostly good starter)
> 
> 2. collect the other file's commits which seem to belong to the same
>    revision (for me, a revision is a set of files, as in monotone. I
>    don't know what terms you use here, probably we should define a
>    set of terms to discuss such issues and avoid confusion.)
> 
> 3. check if any of those file commits conflict in the dependency graph.
>    I.e. in your example above file 1 would also find a commit Z, but
>    it conflicts A, B, ... and Y.
> 
>    If there are conflics, take the first one in your graph (A) and
>    repeat from step 2 with that commit. Otherwise continue.
> 
> 4. You now have the 'next' revision to commit (next in the dependency
>    graph sense).
> 
> 
> With such an algorithm, you won't rely on the timestamps, but only on
> the dependencies. Thus, what other advantages would the blob method have?

Step 2 is essentially the creation of a blob, isn't it?

And steps 2 and 3 could be an infinite loop, because of

   1   2
   |   |
   A   B
   |   |
   B   A

This can arise if two (nonatomic, remember) CVS commits are going on at
the same time, even without clock errors.  Of course more complicated
loops can also arise.

>> Tags and branches do not have any timestamps at all in CVS.  (You can
>> sometimes put bounds on the timestamps: a branch must have been created
>> after the version from which it sprouts, and before the first commit on
>> the branch (if there ever was a commit on the branch).)  And it is not
>> possible to distinguish whether two branches/tags sprouted from the same
>> revision of a file or whether one sprouted from the other.  So a
>> date-based method has to work hard to get tags and branches correct.
> 
> But in the above way, none of it would be timestamp based. You could, as
> you do in your blob method, insert tag and branch 'events', which would
> be dependent on a commit event of a certain file. You would then not get
> a 'revision' in step 4 above, but a branch or tag.
> 
> (Don't get me wrong, I think the blob method is better. Because I
> suspect importing a CVS repository can't be that simple. But I'm missing
> prove of that.)

Yes, but branches and especially tags are very slippery.  They don't
even have to be created (chronologically) before a succeeding commit on
the same file.  So you'll have branch/tag events rising to the top of
the frontier and you need some way to decide when to process them.

Not that this part is much easier in the blob scheme, except that from
early on you have a global picture of the topology of branches/tags so I
think it should be easier to design the heuristics that will be needed.

Michael




reply via email to

[Prev in Thread] Current Thread [Next in Thread]