monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] database compaction


From: Zack Weinberg
Subject: Re: [Monotone-devel] database compaction
Date: Wed, 17 Oct 2007 13:46:39 -0700

On 10/17/07, Markus Schiltknecht <address@hidden> wrote:
> Hi,
>
> I've recently been trying to implement some of the ideas from the
> database compaction wiki page [1].

Cool!

> == Compact Heights ==
>
> As the wiki says, uleb128 wouldn't work too well, because compressed
> heights couldn't be compared with memcmp().

Paul Crowley had a suggestion for an encoding that would work and
might be simple enough to be worth implementing, even though (as you
say) the space savings are quite small relative to the total database
size.

> == Using rowids or lookaside tables ==
>
> The wiki has several paragraphs about using rowids or lookaside tables.
> I don't particularly like that idea, because a) it increases the number
> of lookups needed, b) savings are dubious and c) it is database specific
> (okay, that probably doesn't count...). I might rethink b) as soon as we
> switch to longer hashes, but 20 bytes hardly justify the extra row
> headers (and indices) necessary.

We really do want to do this, but only in contexts where table A
stores a hash which is used as the sole key in a SELECT on table B.
For example: file_deltas.base, revision_ancestry.(parent,child),
rosters.id, roster_deltas.(id,base) and maybe revision_certs.id.  In
that context, doing it reduces lookups, because what sqlite does if
asked to SELECT * FROM b WHERE id = 'hash' is look up 'hash' in the
autoindex for id, which tells it the rowid, and then do a second
lookup in the actual table.  So you cut the number of B-tree queries
in half.  For example

'EXPLAIN SELECT data FROM files WHERE id = "foo"' -> 25 rows

0 | Goto | 0 | 21 |
1 | Integer | 0 | 0 |
2 | OpenRead | 0 | 2 |
3 | SetNumColumns | 0 | 2 |
4 | Integer | 0 | 0 |
5 | OpenRead | 1 | 3 | keyinfo(1,BINARY)
6 | String8 | 0 | 0 | foo
7 | IsNull | -1 | 18 |
8 | MakeRecord | 1 | 0 | b
9 | MemStore | 0 | 0 |
10 | MoveGe | 1 | 18 |
11 | MemLoad | 0 | 0 |
12 | IdxGE | 1 | 18 | +
13 | IdxRowid | 1 | 0 |
14 | MoveGe | 0 | 0 |
15 | Column | 0 | 1 |
16 | Callback | 1 | 0 |
17 | Next | 1 | 11 |
18 | Close | 0 | 0 |
19 | Close | 1 | 0 |
20 | Halt | 0 | 0 |
21 | Transaction | 0 | 0 |
22 | VerifyCookie | 0 | 29 |
23 | Goto | 0 | 1 |
24 | Noop | 0 | 0 |

versus

'EXPLAIN SELECT data FROM files WHERE rowid = 1234' -> 15 rows

0 | Goto | 0 | 11 |
1 | Integer | 0 | 0 |
2 | OpenRead | 0 | 2 |
3 | SetNumColumns | 0 | 2 |
4 | Integer | 1234 | 0 |
5 | MustBeInt | 1 | 9 |
6 | NotExists | 0 | 9 |
7 | Column | 0 | 1 |
8 | Callback | 1 | 0 |
9 | Close | 0 | 0 |
10 | Halt | 0 | 0 |
11 | Transaction | 0 | 0 |
12 | VerifyCookie | 0 | 29 |
13 | Goto | 0 | 1 |
14 | Noop | 0 | 0 |

(see http://sqlite.org/opcode.html -- particularly IdxGE, IdxRowid,
MoveGe, NotExists)

zw




reply via email to

[Prev in Thread] Current Thread [Next in Thread]