monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Scalability question


From: Timothy Brownawell
Subject: Re: [Monotone-devel] Scalability question
Date: Fri, 4 Aug 2006 14:47:08 -0500

On 8/4/06, Jonathan S. Shapiro <address@hidden> wrote:
If I understand the documents correctly, there are a whole lot of places
in the monotone schema that are very similar to things we did in OpenCM.
One of these bit us badly on scalability. I want to identify the issue,
explain how it bit us, and ask whether it has been a problem in
monotone. If not, why not?

The Monotone "Manifest" is directly equivalent to the OpenCM "Change"
object. We went through various iterations on our Change objects, and we
hit two scalability issues. The first arises with very large projects.
The second impacts initial checkout (in monotone, it would probably
arise in push/pull rather than checkout).

Like monotone, OpenCM does not store entries for directories; they are
implicit in the file paths. In contrast to Monotone, OpenCM adds a level
of indirection between our Change records and our Content objects. The
intermediate object is called an Entity. It stores the (file-name,
content-sha1) pair and a couple of other things that aren't important
for this question.

Consider a mid-sized project such as EROS, which has ~20,000 source
files. [For calibration, OpenBSD is *much* larger]. This means 20,000
sha-1's in the Manifest/Change. In OpenCM, these are stored in binary
form, so each sha-1 occupies 20 bytes, and the resulting Change object
is about 400 kilobytes.

Internally, we don't really use manifests (much) anymore. Instead we
use "rosters", which are private manifest-plus-merge-metadata objects.
We currently store them as plaintext, but have been considering
storing them as sets of database rows for performance reasons.

This particular object sees a lot of delta computations, and simply
reading and writing it takes a noticeable amount of time. Also, the need
to sync a 400 kbyte object in order to begin a checkout is very
disconcerting to users -- especially when you are doing it over a slow
link at (e.g.) a hotel or (e.g.) a PPP link [Yes, a lot of people really
still use dial-up).

We don't send manifests (or rosters) over the network. Instead we send
revisions, which include a list of changes (add, drop, rename, patch,
etc) againt the parent revision(s).

I am interested to know if this has been a scalability issue in
monotone? What performance result might I expect if I load EROS into
monotone?

It would probably be kinda slow. I sorta recall that it's slow for
OpenEmbedded, but I think they're still using 0.25 (before our change
to using rosters instead of manifests internally), so more recent
versions might be less slow.

If it *has* been a scalability issue, I have some hindsight suggestions
to offer based on the OpenCM experiences, but I don't want to seem
pushy.

We have seen some slowness, yes. Our current thinking is to store our
rosters as table rows. This lets us really store one as only the rows
that are different from its parent(s?), which will speed up
taking/applying deltas. It also save us from having to parse them
to/from the plaintext format as much. They don't cause large network
transfers, because they're not sent over the network.

Yes, suggestions are always welcome.

Tim




reply via email to

[Prev in Thread] Current Thread [Next in Thread]