[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Monotone-devel] Re: Support for binary files, scalability and Windo
From: |
Ori Berger |
Subject: |
Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port |
Date: |
Mon, 19 Jan 2004 01:05:44 +0200 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4; MultiZilla v1.5.0.2g) Gecko/20031007 |
graydon hoare wrote:
I don't know what you mean by "all sources for a file". could you be
more specific about where you mean to use a suffix array, and how it
helps with a particular bit of retrieval or storage? the way I see it, a
persistent suffix array makes the on-disk representation of a string
considerably larger, without improving the xdelta problem (which is:
find all common regions between these two strings)
What I meant by "all sources" is that there is no inherent a-priori
reason to encode a file with respect to one _given_ file. It is
often the case that the best reference is the previous version of
the same file. But when you merge two files into one, for example,
the "natural" delta is from both, rather than from a specific one.
If you have a persistent suffix tree (one tree for the _entire_
repository), when you want to put a new version in the database, you
just start walking the tree, and note what part of it you take from
what repository file. Renames, copies, splits, merges, etc. will all
be efficiently represented (in terms of "delta" storage) even
without any explicit "monotone rename" command.
Your elevator pitch includes "ancestry is just some interesting
metadata", and this is just taking it to the extreme ... A version
is stored as a patchwork of parts of all previous files, with or
without any ancestry knowledge.
I don't think that the theoretic possibilities justify any work at
this time. Nevertheless, the idea is worth documenting, I think.
It's implementation is orthogonal to the block structure below, but
the underlying philosophy is the same.
Xdelta could be more-or-less as useful, if you store the adler32 of
all reference blocks in an accessible location that you can consult
while scanning a new file.
that technique is really nothing more than extending xdelta to work
over block-structured files, which is (as I'll get to) advantageous
since it permits Very Large Files (~2 ** 40 or so) and also makes
network sync work.
What you're describing is, as you said, extended xdelta. What I was
describing is doing away with ancestry as a factor in storage and
identification. If breaking to blocks depends on context (e.g.,
ancestry), then, patches that are applied in a different order (but
result in the same outcome) are likely to generate different block
structures. Personally, I much prefer stateless representations that
depend only on the data and not its history - YMMV.
hmm, no, not changing the manifest format. the file remains identified
by its SHA1. if you don't like SHA1, substitute some other function
which makes an identifier given an input string, but if the filesystem
can maintain the notion of a "file" using inodes, so can manifests.
I didn't understand that one. I was suggesting (or at least, thought
I was suggesting) either (a) to include SHA1 for blocks inside the
same manifest, or alternatively, (b) keep another per-file manifest
that says how to construct it from its SHA1 blocks, in the same way
that a manifest can be used to construct an atomic revision from
files. Whoever receives an (a) or (b) manifest can then go hunting
for either the blocks or the files, in whatever way needed. (b)
would be completely transparent, and (a) would cause sha1sum to
correctly check the complete files, but report the file _parts_ as
missing.
Yep, it's not too bright; But I wasn't suggesting dropping SHA1 in
any way, and I couldn't understand the relation to inodes. Could you
please elaborate?
as far as I'm concerned, I'd keep doing what I'm doing now, reuse the
block + delta storage system for *storing* manifests, too. they're data
too.
Hmm, I wasn't aware manifests were treated specially in storage. I
thought they were only special by their cert status.
hm, for the network I have a broader idea: use hash trees over the
entire space of SHA1 to synchronize my collection + your collection into
the union of both (on both ends). with some special accounting to manage
singletons and tombstones, and a good spread factor, it's very efficient
to synchronize hash trees, and I'd use the exact same scheme to sync the
collection of blocks, the collection of manifests, the collection of
files, the collection of keys, and the collection of certs.
Sounds good. As Nathaniel hinted in a recent post, hash tree
synchronization might not play well with your block-extended xdelta;
It needs to be stateless to play well, I think.
About NNTP, email, "dumb web" distribution - all you have to do is
record, for each block whether or not it was sent to a specific
destination.
no, I think I'd just remove these things altogether, let people sync
between databases directly as the primary mode of operation.
Ouch. One of the things I like so much about monotone is that it is
transport agnostic. Requiring a smart online server would kill that.
I also think it's an increased barrier for many people.
It should still be possible to build an NNTP/email/dumb-web gateway
though, even if you do move to an interactive hashtree-sync. It's
the same logic I described earlier, of "for each block not yet sent
to a gateway destination, send to gateway destination" and "add
received blocks to database".
Ori.
- [Monotone-devel] Support for binary files, scalability and Windows port, Asger Ottar Alstrup, 2004/01/12
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/12
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, Asger Kunuk Ottar Alstrup, 2004/01/15
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Zbynek Winkler, 2004/01/15
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/16
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Ori Berger, 2004/01/16
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/17
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Nathaniel Smith, 2004/01/17
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Zbynek Winkler, 2004/01/19
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port,
Ori Berger <=
- Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Zack Weinberg, 2004/01/18
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
- [Monotone-devel] RE: Support for binary files, scalability and Windows port, Asger Kunuk Alstrup, 2004/01/18
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, Peter Simons, 2004/01/18
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
- [Monotone-devel] RE: Support for binary files, scalability and Windows port, Asger Kunuk Ottar Alstrup, 2004/01/19
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
- [Monotone-devel] RE: Support for binary files, scalability and Windows port, Asger Kunuk Ottar Alstrup, 2004/01/20
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/20
- [Monotone-devel] RE: Support for binary files, scalability and Windows port, Asger Kunuk Ottar Alstrup, 2004/01/21