monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Roster handling in the code


From: Stephen Leake
Subject: Re: [Monotone-devel] Roster handling in the code
Date: Fri, 24 Jul 2009 22:07:35 -0400
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (windows-nt)

Patrick Georgi <address@hidden> writes:

> I'm currently trying to optimize the roster deltification code (as used
> by pull, put_revision, probably commit and others).

Ok.

> The current code takes two rosters, compares them, and registers the
> differences. The problem is that this approach scales by roster size
> (which is directly dependent on the tree size). The problem is, that I'm
> working with trees with 50000 files - comparing all of their entries
> just to find the 5 that actually changed is quite a waste of time.
> As a test, I disabled the roster deltification code, and got quite some
> speedup (pushing 10 revisions with put_revision in approx. half the
> time), so there's definitely room for improvement.

I think the main motivation for roster_deltas is database size. If we
can store a new revision as an old revision plus a small delta, that's
a significant size saving. As with anything that saves size, it costs
computation time. Which doesn't mean the current trade-off is optimal.
Disk space is certainly cheaper than it used to be. But so is CPU time
:).

Perhaps a user tuning option could be provided?

> ...
> My current approach is to try to create the roster_delta while the new
> roster is created out of the old (using
> cset.apply_to(editable_roster_for_{non,}merge)). The most appropriate
> place for this seems to be editable_roster_for_{non,}merge, as all data
> is passed to it as some point of the operation.

I have wondered why there were two different "delta" mechanisms; one
that creates changesets, and one that creates roster deltas. I thought
at one time that one could be eliminated. But then I realized both are
needed. I'm not sure I can remember the details, though :(.

It could be that combining them would lead to a time savings.

> But I'm looking for a more direct approach: Is there a way to get figure
> out the revision that matches a roster?

Rosters are fetched in the first place by revision id, so I'm not sure
what you are asking.

For example:

  db.get_roster(left_rid, left_roster, left_marking_map);
  db.get_roster(right_rid, right_roster, right_marking_map);

This is from the merge code. If you could point to the section of the
sync code you are worried about, it might help.

> From there, it should be possible to find out the changeset between the
> two revisions,

Yes, the changeset from the immediate parent to the revision is stored
in the revision.

When syncing, I'm not sure what order the revisions are sent in. It
would make most sense to send them in parent/child order, so the
roster deltas and markings on the receiving end can be computed as
each revision arrives. Assuming that's true, the sync process never
has to compute a delta between two rosters that are not immediate
parent and child.

So there's no need to chain changesets.

> which contains most of the data a roster_delta contains - basically
> everything but node_ids and markings, if I'm not mistaken - and
> those can be fetched out of the rosters, given the cset data. 

Markings must be computed for new revisions; they can't be fetched,
and they are not sent via netsync. That's a netsync time optimization;
if you have a really fast connection, it could easily be that the
netsync time saved is not worth the CPU time incurred.

Hmm; you can't eliminate the marking computation entirely. Node ids
are _not_ guarranteed to be the same between dbs, and I don't think
there's any way to force them to be the same (in theory, not just in
practice). So translating the node ids in the markings would take
time; a file name match for each.

The current markings for the parent revisions can be fetched as a
starting point. But computing markings is orthogonal to computing
roster_deltas anyway.

> 5 lookups instead of 50000.

Computing a roster delta should not involve retrieving the file text
for every file; doesn't it check if the file id (hash of the contents)
has changed first? Those are given in the roster, so comparing them
doesn't count as a "lookup".

What are you defining as a "lookup"?

> If anyone has ideas on how to improve this area in a better way than
> what I tried to describe here, I'd be grateful to hear them. I'm not
> that used to the monotone code base, and this is the first time I really
> dive into it.

I'm not clear what you are suggesting, but I'm interested in
discussing it further.

--
-- Stephe




reply via email to

[Prev in Thread] Current Thread [Next in Thread]