monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?)


From: Nathaniel Smith
Subject: Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?)
Date: Thu, 24 Aug 2006 05:55:42 -0700
User-agent: Mutt/1.5.12-2006-07-14

On Thu, Aug 24, 2006 at 11:48:14AM +0200, Markus Schiltknecht wrote:
> Markus Schiltknecht wrote:
> >I have read the design-notes.txt of cvs2svn (which is a very good 
> >documentation of their process, BTW) and came to the conclusion that we 
> >cannot rely on the timestamp of the file revisions. Thus I began 
> >implementing a file_id -> RCS revision number mapping. I've currently 
> >just added a table to do this. Having that information stored durably 
> >allows for a faster resync later on, I think.
> 
> I've just synced my changes to venge.net. Please be aware that the code 
> currently is quite messy. The branch point detection code relies on the 
> timestamp, which is surely wrong.
> 
> I'll try to clean it up somewhat as soon as possible.

Cool!  It's very much a wanted feature; my apologies again for it
having gotten dropped on the floor a bit before.

My memory of the discussion before is not that it was rejected for not
being like cvs2svn.  Just, if you're making up your own algorithm,
we'd like to see a description and justification of it so we have a
chance to apply some of the collective brain power here to making sure
it makes sense.  Because, well, I doubt _anyone_ is smart enough to
invent a complete and correct CVS reconstruction algorithm without
some help noticing where they forgot nasty edge cases :-).  (Certainly
I'm not.)

And, to make the process a little easier, cvs2svn is a very good place
to look, because they've done a lot of that work to find all the
approaches that _don't_ work already, so hopefully we could piggyback
on that.

> Regarding RCS revision number: saving all RCS numbers of every file in a 
> revision in a cert is fine, but I need to be able to get the number for 
> a file_id. Is it okay to have a SQL table for that? What else would be 
> feasible?

I don't really know what the right way to write down the RCS numbers
to allow later matching up is -- Christof I know has thought about it
more.

I am a bit curious about this sql table for tracking them, though;
it doesn't make a whole lot of sense to me at first glance.  There's
some question about storing it on disk in the first place --
everything else cvs_import does is in-memory, which might not be
ideal, but it hasn't seemed to cause any problems yet, and fixing it
will take more than moving one single data structure onto disk.  More
than that, though, it seems unlikely that a file_id<->rcs number
mapping is what you're actually looking for?

Recall that a file_id simply identifies a bitstring -- it does not
correspond uniquely to any particular "file" in any particular
revision.  In fact, a given revision may contain many files, that all
have the same file_id (because they happen to have the same content).

Similarly, a rcs number is not useful on its own; every rcs file has
some revision numbered 1.1, for instance... unless we're somehow
mashing the rcs filename and the rcs version number together into a
single string in this table, I don't see how it can be useful?

-- Nathaniel

-- 
"Of course, the entire effort is to put oneself
 Outside the ordinary range
 Of what are called statistics."
  -- Stephan Spender




reply via email to

[Prev in Thread] Current Thread [Next in Thread]