monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: Improving the performance of annotate


From: Graydon Hoare
Subject: [Monotone-devel] Re: Improving the performance of annotate
Date: Tue, 18 Jul 2006 21:48:10 -0700
User-agent: Thunderbird 1.5.0.4 (Windows/20060516)

Daniel Carosone wrote:

my suggestion and preference here would be storing the roster details
in open sql, or at least caching the relevant details in sql. This is
similar to the 'per-file DAG' information discussed previously.  The
idea of inventing and storing our own additional indexes when we
already have a storage layer with these capabilities just seems
incongruous.

Agreed. I spent some time last time I was traveling making some notes on this and sketching out a picture, but never posted about it. I think a sensible shift in the storage layer would involve:

  - Storing only head rosters in full, no roster deltas.

  - Storing, along with each revision (in fulltext), a symbolic
    encoding of the changes-to-nodes-and-markings that the revision
    implies. This would be done as a set of split-out SQL rows.

  - The xdelta-reconstruction code would need to have its path-finding
    code split out from its applicator-feeding code, so that we could
    reuse the path-finding stuff to reconstruct arbitrary rosters from
    "head roster" + "sequence of symbolic roster actions". But it's
    totally doable.

This would attack most of our worst remaining bottlenecks:

  - Turning off the manifest hash check during netsync would make
    netsync a no-hash, no-xdelta operation; it would still spend a
    bit of time applying revs to an in-memory roster and working out
    markings, but the internal i/o costs could go away. Manifest checks
    could be run offline / periodically for users who don't mind
    relaxing a bit.

  - Annotate, log, etc. could apply a restriction to head rosters
    at first and then walk backwards through database queries
    against the affected nodes alone, skipping the parsing
    and hashing of intermediate rosters *and* skipping between
    revs that actually affect the file(s) in question.

You'd effectively be getting "per file DAGs" out of this, since the changes-to-markings *are* the DAG edges you care about.

I'm not sure how far off this scheme is from what the existing "per file DAGs" work does; I've not had time to investigate it. But I think it's completely compatible with how we do things now, and would not involve changing network protocol or external formats. So I'm quite happy if someone wants to spend time playing with it.

You're absolutely right that rosters are a "local cache"; they were designed that way, and we expressly kept their format private-to-the-db so that we could change it as efficiency demanded. Efficiency is now demanding, and changing the format to suit is quite legitimate.

-graydon





reply via email to

[Prev in Thread] Current Thread [Next in Thread]