monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?)


From: Markus Schiltknecht
Subject: Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?)
Date: Thu, 24 Aug 2006 15:24:18 +0200
User-agent: Thunderbird 1.5.0.5 (X11/20060812)

Nathaniel Smith wrote:
My memory of the discussion before is not that it was rejected for not
being like cvs2svn.  Just, if you're making up your own algorithm,
we'd like to see a description and justification of it so we have a
chance to apply some of the collective brain power here to making sure
it makes sense.  Because, well, I doubt _anyone_ is smart enough to
invent a complete and correct CVS reconstruction algorithm without
some help noticing where they forgot nasty edge cases :-).  (Certainly
I'm not.)

Maybe. However, I don't feel like making up my own algorithm for that. I just thought maybe this change I did could already be sufficient. But I know not it's not. So I will try to do something closer to what cvs2svn does.

And, to make the process a little easier, cvs2svn is a very good place
to look, because they've done a lot of that work to find all the
approaches that _don't_ work already, so hopefully we could piggyback
on that.

..yeah, I have already included their design-notes.txt into the repository (uh... is that license compatible at all?) and added my own comments about how I did it for mtn cvs_import.

I am a bit curious about this sql table for tracking them, though;
it doesn't make a whole lot of sense to me at first glance.  There's
some question about storing it on disk in the first place --

We need to store some information on disk to help speed up later resyncs. I'm not sure if it's this RCS version <-> file_id mapping which helps most. Of course as it's a separate table (as is) a resync could only happen on the database which also did the very first import.

everything else cvs_import does is in-memory, which might not be
ideal, but it hasn't seemed to cause any problems yet, and fixing it
will take more than moving one single data structure onto disk.

Why not? What more does it take? Do you want to have such information netsynced to other repositories?

More
than that, though, it seems unlikely that a file_id<->rcs number
mapping is what you're actually looking for?

Like I said, I don't know.

Recall that a file_id simply identifies a bitstring -- it does not
correspond uniquely to any particular "file" in any particular
revision.  In fact, a given revision may contain many files, that all
have the same file_id (because they happen to have the same content).

Aha. And from a file_id you cannot get the filename, then? So this should better be called 'stream_id'?

Similarly, a rcs number is not useful on its own; every rcs file has
some revision numbered 1.1, for instance... unless we're somehow
mashing the rcs filename and the rcs version number together into a
single string in this table, I don't see how it can be useful?

Yeah, we probably need the filename, too.

Like I said, it's just my 'scratch pad' thing. And it 'works' - at least so far as it does write out the (to some extent useless) RCS -> file_id mapping.

Regards

Markus




reply via email to

[Prev in Thread] Current Thread [Next in Thread]