monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: RFC: Fake IDs


From: Graydon Hoare
Subject: [Monotone-devel] Re: RFC: Fake IDs
Date: Tue, 18 Jul 2006 18:27:59 -0700
User-agent: Thunderbird 1.5.0.4 (Windows/20060516)

Jack Lloyd wrote:
On Tue, Jul 18, 2006 at 05:24:08PM -0700, Graydon Hoare wrote:

It's easy enough to start with the "all zero" SHA1 and count upwards. It's probably even fine if you just run through the first 2^64 of them for now; you can't store anywhere near that much data in an sqlite database.

Sounds to me like IDs need a unique namespace for "this bitstring
generated by algorithm X" (X = sha1,random,counter,sha256,...)

Eh, I dunno; I think you might be mixing issues here.

The cases where we need a "fake" ID I think have to do with graph algorithms that normally operate on real revisions, but that you want to apply to synthetic revisions that don't have a "real" history. So there's nothing to hash. Currently these algorithms don't know anything about the concept of "fake revisions". If we were going to make those algorithms know about fake revisions as you suggest, we might as well just refer to fake revisions by place-holder numbers: fake-rev-1, fake-rev-2, etc. There's no need to use "content IDs" at all, since there's no content in question to be hashing.

If this were a pretty language with decent support for disjoint unions, I might suggest making a new type and rewriting the algorithms in question to handle a "content-ID OR fake-node-number" type; at least then the decision to have different/identical behavior for fake nodes would be explicit. Since it's C++, we resort to magic values (ugh) and hope we don't have code that accidentally treats magic values as non-magic.

(we do the same thing in rosters, wrt. "temporary" roster-node numbers, differentiated by high-bit. I wish we didn't!)

As far as the second issue you allude to (SHA160 vs. SHA256): I don't think we want to get into mixing multiple content-ID algorithms in the same database / networking group. If we move to SHA256 or SHA512 or whatever, any code using SHA160 will be around strictly for compatibility / migration sake. The point of using a hash for content-addressing (not just integrity-checking) is that you can take content and *produce* an address; if you have to consider "what algorithm the other guy might have used" then you'd often have to work with sets-of-addresses ("all possible algorithms at once") in order to do meaningful work. That'll get ugly fast.

For certs, I agree, we should start tagging the certs with the signature algorithm. When/if I ever get around to redoing certs (I hope the policy-branch work will include that) I intend to. But this is because we don't use RSA signatures for content-addressing, just authentication. It's easy to let N signature algorithms to coexist in that case.

-graydon





reply via email to

[Prev in Thread] Current Thread [Next in Thread]