gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] manuscripts/storm article.rst


From: Benja Fallenstein
Subject: [Gzz-commits] manuscripts/storm article.rst
Date: Sat, 01 Feb 2003 22:45:37 -0500

CVSROOT:        /cvsroot/gzz
Module name:    manuscripts
Changes by:     Benja Fallenstein <address@hidden>      03/02/01 22:45:37

Modified files:
        storm          : article.rst 

Log message:
        Some stuff on diffs. Unfortunately not finished...

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/manuscripts/storm/article.rst.diff?tr1=1.70&tr2=1.71&r1=text&r2=text

Patches:
Index: manuscripts/storm/article.rst
diff -u manuscripts/storm/article.rst:1.70 manuscripts/storm/article.rst:1.71
--- manuscripts/storm/article.rst:1.70  Sat Feb  1 20:59:45 2003
+++ manuscripts/storm/article.rst       Sat Feb  1 22:45:37 2003
@@ -466,9 +466,9 @@
                                  | 
                                  |
                                1 | target
-                            +--------+
-                            | Target |
-                            +--------+
+                            +---------+
+                            | Target  |
+                            +---------+
 
 In addition to the pointer and the target, pointer blocks contain
 a list of *obsoleted* pointer blocks. These are pointer blocks
@@ -486,7 +486,97 @@
 6.2. Diffs
 ----------
 
-XXX
+[benja says: Please do not touch this section, but tell me
+how to improve it instead. Reason: this is the meat for my thesis,
+due Feb 9th, so I want all possible improvements on it
+to go there, too ;-) [and I'm of course allowed to solicite feedback,
+but not allowed to use stuff written by someone else...]]
+
+The pointer system suggests that for each version of a document,
+we store an independent block containing this version. This
+obviously doesn't scale well when we keep a lot of versions
+each with only small changes. Instead, we use the well-known
+technique of storing only the differences between versions.
+
+A simplistic scheme to implement this atop block storage would be
+to form a block by storing the difference to the previous version
+together with the id of the block representing that version
+(indeed, this was the way our first implementation worked).
+To load a version, it is then necessary to process 
+all the differences in order. This scheme makes it extremely
+difficult to remove an older version: The chain of versions
+leading up to the current one would be broken if any
+previous version were deleted. 
+
+Additionally, many versioning systems
+[ref-- CVS?] store the current version as well as the differences,
+enabling them to retrieve the current version quickly and compute
+recent versions by applying the differences 'backwards,'
+starting from the current version. This technique would also be
+made harder by the simplistic scheme above because XXX.
+
+The root of these difficulties is that we refer to a version
+by the hash of a data record referencing the previous version,
+represented as a hash of a data record referencing the version
+before that, and so on. 
+
+To solve this problem, in our current
+design we refer to a version by the hash of that version--
+that is, the hash of a block containing that version.
+However, we do not necessarily store this block,
+even though we refer to it. Instead, we may create a *diff block*,
+containing a difference and the ids of the versions this
+is a difference between. When we want to load a version
+and do not have the block, we use Storm indexing to find
+all diff blocks from or to that version, trying to find
+a chain of differences starting at a known version. Then,
+we can apply the differences in order, and arrive at the version
+we seek.
+
+When we have reconstructed this version, we create a Storm block
+from it and check that it matches the id of the version
+we are seeking. This way, we do not need to place any trust
+in the diff blocks we are using. While anybody can create
+a diff block pretending to give us version X even though it really
+gives us version Y, we can still retrieve diff blocks from
+an untrusted network source because we can check whether a block
+has given us version X or Y by checking the cryptographic hash.
+
+In this scheme, we can easily drop a previous version
+by merging differences: If we have stored the differences
+from version ``A`` to ``B``, and ``B`` to ``C``, 
+to drop version ``B``, we compute
+the difference from ``A`` to ``C``, and replace the two
+previous differences by it. If we also store a difference
+between version ``C`` and ``D``, it does not need
+to be altered, because it refers to *version ``C``* and not
+the difference to ``C`` from ``B`` (as in the simplistic scheme).
+
+[XXX Figure: diffs from a->b, b->c, c->d; we replace
+the diffs a->b and b->c by a single diff a->c.]
+
+We can also store the block containing version ``D``
+in addition to storing the versions above. Then, we can reconstruct
+version ``C`` in two ways: By using the diffs from ``A`` to ``B``
+and ``B`` to ``C``, or, more efficiently, by applying the inverse
+of the diff from ``C`` to ``D`` to version ``D``.
+
+[XXX fig?]
+
+Our current implementation is a layer above Storm block storage
+and indexing. This layer implements a ``load(version-id) -> version``
+interface through the following simplified algorithm:
+
+1. If the block for ``version-id`` is in the pool, return it.
+2. Else, search for diff blocks storing the difference
+   between ``version-id`` and any other version.
+3. For each of these blocks, attempt to ``load()`` the *other* version.
+4. If successful, apply the difference.
+5. Check the hash of the resulting version. If correct, return it;
+   if incorrect, go back to step 3.
+
+As computing differences is file-format dependent, so is our system
+for storing versions. We 
 
 
 7. Peer-to-peer implementations




reply via email to

[Prev in Thread] Current Thread [Next in Thread]