monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] speed of 0.99


From: Timothy Brownawell
Subject: Re: [Monotone-devel] speed of 0.99
Date: Mon, 29 Nov 2010 07:40:06 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.8) Gecko/20100821 Iceowl/1.0b2 Icedove/3.1.2

On 11/29/2010 05:10 AM, Markus Wanner wrote:
> On 11/28/2010 07:55 PM, Timothy Brownawell wrote:
>> Do we mind having an index-only schema change between 0.99 and 1.0? It
>> sounds like we'd get a noticeably better user experience...
> 
> I'd certainly appreciate it, but maybe someone who values the 1.0 change
> a bit more than I do should comment on that one.
> 
>> -CREATE INDEX revision_certs__revision_id ON revision_certs (revision_id);
>> +CREATE INDEX revision_certs__revnameval ON revision_certs (revision_id,
>> +       name, value, keypair_id, signature);
> 
> Hm.. with a strong Postgres background, this looks a bit overkill. Do we
> really need to replicate that many attributes in the index? Doesn't
> (rev_id, name) suffice? Do we ever scan for (rev_id, name, value) so
> that would be an advantage?

The most we look up on is (rev_id, name, value), but including the extra
columns does help a bit (actually, indexing on revid/name/value doesn't
look like it gives any improvement over the index on revid):


http://www.sqlite.org/optoverview.html

7.0 Avoidance of table lookups

When doing an indexed lookup of a row, the usual procedure is to do a binary
search on the index to find the index entry, then extract the rowid from the
index and use that rowid to do a binary search on the original table. Thus a
typical indexed lookup involves two binary searches. If, however, all columns
that were to be fetched from the table are already available in the index
itself, SQLite will use the values contained in the index and will never look
up the original table row. This saves one binary search for each row and can
make many queries run twice as fast. 


Each revision typically has 4 certs (author, branch, date, changelog), so the
old index on only rev_id should usually give 5 lookups. Indexing on (rev_id,
name, value) should usually give 2 lookups, and indexing on everything gives
1 lookup.

... strace -c shows 534 calls to read() and 150 to lseek() with everything in
the index, vs 2050 calls to read() and 1666 to lseek() with only (rev_id, name,
value). That doesn't make sense, it shouldn't be more than about twice as much.


sqlite> .schema revision_certs
CREATE TABLE revision_certs
        (
        hash not null unique,   -- hash of remaining fields separated by ":"
        revision_id not null,   -- joins with revisions.id
        name not null,          -- opaque string chosen by user
        value not null,         -- opaque blob
        keypair_id not null,    -- joins with public_keys.id
        signature not null,     -- RSA/SHA1 signature of "address@hidden:val]"
        unique(name, value, revision_id, keypair_id, signature)
        );
CREATE INDEX revision_certs__revnameval ON revision_certs (revision_id,
       name, value, keypair_id, signature);
sqlite> explain query plan select revision_id, name, value, keypair_id, 
signature from revision_certs where revision_id = 'abc' and name = 'branch';
0|0|TABLE revision_certs WITH INDEX revision_certs__revnameval


That looks right.


sqlite> .schema revision_certs
CREATE TABLE revision_certs
        (
        hash not null unique,   -- hash of remaining fields separated by ":"
        revision_id not null,   -- joins with revisions.id
        name not null,          -- opaque string chosen by user
        value not null,         -- opaque blob
        keypair_id not null,    -- joins with public_keys.id
        signature not null,     -- RSA/SHA1 signature of "address@hidden:val]"
        unique(name, value, revision_id, keypair_id, signature)
        );
CREATE INDEX revision_certs__idnameval on revision_certs (revision_id, name, 
value);
sqlite> explain query plan select revision_id, name, value, keypair_id, 
signature from revision_certs where revision_id = 'abc' and name = 'branch';
0|0|TABLE revision_certs WITH INDEX sqlite_autoindex_revision_certs_2


...but *that* is using the index created by the "unique", and scanning all
branch certs everywhere. Maybe because we never call ANALYZE?


sqlite> explain query plan select revision_id, name, value, keypair_id, 
signature from revision_certs INDEXED BY REVISION_CERTS__IDNAMEVAL where 
revision_id = 'abc' and name = 'branch';
0|0|TABLE revision_certs WITH INDEX revision_certs__idnameval


This might work as well, except that http://www.sqlite.org/lang_indexedby.html
says very strongly not to and that our code is written like so:

  query q("SELECT revision_id, name, value, keypair_id, signature FROM " + 
table +
          " WHERE revision_id = ? AND name = ?");

in order to... actually, reading the manifest_certs table (for migrations from
rather ancient versions) always uses different functions since changing to
naming keys by hash. So we could get rid of the variable table name and use a
INDEXED BY, except for that warning not to.

Using the INDEXED BY with an index on (rev_id, name, value) takes 2.5s for 'mtn 
st',
vs... now I'm seeing 2.0s instead of the 1.1s I got yesterday.

...time for 'mtn st' varies drastically depending on exactly which revision the
workspace is at (even with inodeprints on and only a small change in the number 
of
files), but INDEXED BY with the 3-field index seems to be consistently .3s - .5s
slower than with everything in the index and the workspace on the same revision.


-- 
Timothy

Free public monotone hosting: http://mtn-host.prjek.net



reply via email to

[Prev in Thread] Current Thread [Next in Thread]