monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] mtndumb & public cert_id


From: Thomas Keller
Subject: Re: [Monotone-devel] mtndumb & public cert_id
Date: Wed, 17 Dec 2008 15:34:11 +0100
User-agent: Thunderbird 2.0.0.18 (Macintosh/20081105)

Zbigniew Zagórski schrieb:
>> I never dived much into the merkle tree thing, ...but if I understand you
>> correctly this needs the signature / sha1 of a single cert in order to
>> determine if it needs to be send over the wire or not, right?
> 
> Yes.
> 
> Synchronization by merkle trees works in two phases:
>  - find-the-difference
>  - to the sync only missing/extra elements
> 
> All i'm talking is the first step. Finding differences.
> In first step you must collect all ids (revision ids, cert ids, key ids) that
> exist in database (let's say L).
> 
> Remote tree set built from remote site (lat's call it R).
> 
>  - pushing you have to push elements from set "L-R" to remote.
>  - when pulling you have to retrieve elements from set "R-L"
> 
> And regarding pushing, actual data is retrieved and serialized only for pushed
> set and not for all certs.

Thanks for the explanation!

>> And you're
>> basically looking for something which is faster than packets_for_certs
>> REVID which only takes revisions, but not selectors, and which outputs
>> more things than you actually need, correct?
> 
> No in fact i look for "get all certs" and "give me specific cert" ...

So, you're actually triggering `mtn automate select_cert '*'` then, right?

>> While adding new commands for this sounds reasonable at a first glance
>> (100% backwards compatible with any automation implementation), I wonder
>> if it wouldn't be better to just hack packets_for_certs (in a
>> backwards-compatible way) f.e. by changing its first parameter, the
>> revision ID, to a selector. Now you then still get all cert packets in
>> return, not just the IDs which you need for the merkle tree, but is this
>> really such a huge speed penalty?
> 
> Well after thinking a little bit i need to clarify. The biggest
> penalty is cost of thousands of automate calls. It's rather big when
> it comes tho  thousands (or tens of thousands) invocations. And it's
> kind of waste when you _always_ want all and only id, but not the packet.

This should have been gotten better a bit in 0.42 since Timothy fixed an
issue with stdio which could not reuse the opened database instance for
every call.

> [BTW mtndumb is quite stupid, so it synchronizes whole database, so
> there is no point for restricting cert set or revision set. ]
> 
> Regarding timing:
> 
> For my private database (~1200 revs):
>    old approach      15s    (~1200 automate calls)
>    new approach      2s     (~15 automate calls)
> 
> For net.venge.monotone db (14348 revs):
>    old approach      180s   (~14400 automate calls)
>    new approach      10s    (~70 automate calls)
> 
> These are _very_ manual tests and take into account whole process
> running, including spawning python, monotone. BTW, on linux timings
> are usually twice as faster for old approach so win32 is
> also an issue.

Wow, ouch, but you're using automate stdio, right?

> All i want is to have ~constant number of automate calls that
> return me bulk data from which i can safely build merkle tree.
> 
> With old approach i need 2+N calls (two to get revisions and toposort
> them, N to get all certs for each revision).
> 
> With new i have 3 calls (all revs, toposort, all certs).

Understood. I've just asked here a bit more because I think what you
(and maybe others) are really looking for is an sql-like interface to
some of the internal data structures. Sure, one could use the 'unstable'
mtn execute "interface", but I wonder if we should wrap something around
this and provide access to the mtn internals for exactly these
high-performance use cases.

Other than that I'm ok with your patch if you add some documentation and
tests to them. If you like to see this in 0.42 you probably have to do
this until the end of this week, otherwise it'll go into 0.43.

Thomas.

-- 
GPG-Key 0x160D1092 | address@hidden | http://thomaskeller.biz
Please note that according to the EU law on data retention, information
on every electronic information exchange might be retained for a period
of six months or longer: http://www.vorratsdatenspeicherung.de/?lang=en

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]