monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Improving the performance of annotate


From: Daniel Carosone
Subject: Re: [Monotone-devel] Improving the performance of annotate
Date: Wed, 19 Jul 2006 12:13:47 +1000
User-agent: Mutt/1.5.11

On Tue, Jul 18, 2006 at 02:41:35PM -0700, Eric Anderson wrote:
> I've been working on improving the performance of annotate.  I have
> found a solution that drops the time for mtn annotate Makefile.am from
> about 175 seconds down to 9 seconds (detailed cpu and memory
> statistics at the bottom).

Great work.  We've had a lot of speculation and little (successful)
measurement as to where this time is really going.  That being said,
it's nice to see your results demonstrating that the speculation
wasn't far off the mark, either :)

skipping the longer detail of your discussion and responding under the
summary headings..

> 1) only parsing the
> portion of the roster that was relevant to the file being annotated.

my suggestion and preference here would be storing the roster details
in open sql, or at least caching the relevant details in sql. This is
similar to the 'per-file DAG' information discussed previously.  The
idea of inventing and storing our own additional indexes when we
already have a storage layer with these capabilities just seems
incongruous.

The implications of this on storage size certainly need some
examination.

> 2) skipping the version hash check in database.cc

At the moment, you're skipping the check of the roster as its
retreived before you start parsing it, right? So you save lots of
checking of rosters you don't necessarily end up using.  Furthermore,
most of your remaining time is spent, uh, "express-parsing" rosters to
see if they're relevant. If we could find relevant rosters quickly,
the remaining saving for both changes could be much less significant.

Some thoughts on this to throw into the philosophical debate that may
follow:

 - I'm very supportive of the validate-everything approach taken by
   monotone, the reasons previously stated are and will remain sound.
   It does come at a cost, and some operations may not wish to pay
   that cost or need the assurances it provides.

 - rosters can be reconstructed fully from other information, after
   all they're constructued locally by each monotone instance during
   netsync.  In a sense, then, they're *already* a private local cache
   to optimise operations on exposed data.

 - we generate and hash and revalidate roster content in a particular
   text format; we can still do this from roster data stored in sql
   tabular form.

 - storing rosters in sql tabular form offers the opportunity to use
   roster data without going through reconstruction and validation
   steps, but care and design is required in deciding when to use
   unvalidated roster data.

 - variations on 'finding relevant revisions' seem like good
   candidates, that can be implemented separately without violating
   good layering, especially if the revisions found with the shortcuts
   are then reconstructed and validated normally before actual use.

 - it would be nice to have db commands to check and regenerate these
   cache or roster data from primary sources, in case of corruption or
   other problems.

--
Dan.


Attachment: pgpsGcE7HnpYx.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]