monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Roster handling in the code


From: Patrick Georgi
Subject: Re: [Monotone-devel] Roster handling in the code
Date: Sat, 25 Jul 2009 09:39:26 +0200
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.1) Gecko/20090715 Thunderbird/3.0b3

Am 25.07.2009 04:07, schrieb Stephen Leake:
Patrick Georgi<address@hidden>  writes:
The current code takes two rosters, compares them, and registers the
differences. The problem is that this approach scales by roster size
(which is directly dependent on the tree size). The problem is, that I'm
working with trees with 50000 files - comparing all of their entries
just to find the 5 that actually changed is quite a waste of time.
As a test, I disabled the roster deltification code, and got quite some
speedup (pushing 10 revisions with put_revision in approx. half the
time), so there's definitely room for improvement.
I think the main motivation for roster_deltas is database size. If we
can store a new revision as an old revision plus a small delta, that's
a significant size saving. As with anything that saves size, it costs
computation time. Which doesn't mean the current trade-off is optimal.
Disk space is certainly cheaper than it used to be. But so is CPU time
:).
Perhaps a user tuning option could be provided?
The problem in this case, that large trees affect both of them very negatively. When I did the profiling, I disabled roster_delta generation (#if 0/#endif around a couple of lines in database.cc) to see if I'm really on the right track. But this "optimization", while reasonable for small trees, really hurts large trees, too. Every roster contains information about 50000 files, and that sums up.

So both potential optimizations, the current one, and disabling roster_deltas, are acceptable for small trees. But small trees are not the case I'm looking at.
...
My current approach is to try to create the roster_delta while the new
roster is created out of the old (using
cset.apply_to(editable_roster_for_{non,}merge)). The most appropriate
place for this seems to be editable_roster_for_{non,}merge, as all data
is passed to it as some point of the operation.
I have wondered why there were two different "delta" mechanisms; one
that creates changesets, and one that creates roster deltas. I thought
at one time that one could be eliminated. But then I realized both are
needed. I'm not sure I can remember the details, though :(.
I think at the time where changesets are created and written to the database, some data that's stored in rosters (and thus roster_deltas) isn't around yet, for example node ids.
It could be that combining them would lead to a time savings.
The ability to generate one out of the other, given the right set of source data, might be enough. As changesets are managed first, my attempt is to efficiently generate roster_deltas out of changesets plus the related rosters.
But I'm looking for a more direct approach: Is there a way to get figure
out the revision that matches a roster?
Rosters are fetched in the first place by revision id, so I'm not sure
what you are asking.
I was deep inside the code, and at some point, it was only shuffling rosters, with no revision ids in sight. I just looked a couple of levels further up, and everything needed seems to be there.
For example:

   db.get_roster(left_rid, left_roster, left_marking_map);
   db.get_roster(right_rid, right_roster, right_marking_map);

This is from the merge code. If you could point to the section of the
sync code you are worried about, it might help.
This gave me an idea: database::put_roster handles the deltification of rosters. The revision ids are available there, so I could hand over the changesets to the delta_rosters(...) call there. See below.
5 lookups instead of 50000.
Computing a roster delta should not involve retrieving the file text
for every file; doesn't it check if the file id (hash of the contents)
has changed first? Those are given in the roster, so comparing them
doesn't count as a "lookup".
Yes, but look how it's done. It's in roster_delta.cc, make_roster_delta_t(...): It creates a datastructure containing information about _all_ files in the roster (ie. all 50000 in my case), and then loops over them and notes the differences. The cset already contains a list of all changes between the two revisions, so the code could walk a changeset to figure out which roster entries to handle. As most changesets are rather small, that should give a nice speed up.
What are you defining as a "lookup"?
This iterator is used to look up the changes: parallel::iter<node_map> i(from.all_nodes(), to.all_nodes());


Regards,
Patrick




reply via email to

[Prev in Thread] Current Thread [Next in Thread]