gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] manuscripts/storm article.rst


From: Benja Fallenstein
Subject: [Gzz-commits] manuscripts/storm article.rst
Date: Sat, 01 Feb 2003 17:24:52 -0500

CVSROOT:        /cvsroot/gzz
Module name:    manuscripts
Changes by:     Benja Fallenstein <address@hidden>      03/02/01 17:24:52

Modified files:
        storm          : article.rst 

Log message:
        Section 5, 'Indexing'

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/manuscripts/storm/article.rst.diff?tr1=1.68&tr2=1.69&r1=text&r2=text

Patches:
Index: manuscripts/storm/article.rst
diff -u manuscripts/storm/article.rst:1.68 manuscripts/storm/article.rst:1.69
--- manuscripts/storm/article.rst:1.68  Fri Jan 31 09:46:46 2003
+++ manuscripts/storm/article.rst       Sat Feb  1 17:24:52 2003
@@ -350,7 +350,73 @@
 5. Indexing
 ===========
 
-XXX
+Finding links to and transclusions of a piece of content in
+Xanalogical storage is but one example of *reverse indexing*
+of Storm blocks: finding a block based on its contents.
+(For other examples, see section 6, below.)
+Storm provides a general API for indexing blocks in
+application-specific ways. We have implemented indexing
+on a local machine, but the interface is designed so that
+implementation on top of a distributed hashtable
+will be trivial.
+
+In Storm, applications are not allowed to put arbitrary
+mappings into the index. Instead, applications that want 
+to index blocks provide the following callback
+to a Storm pool::
+
+    getMappings(block) -> set of (key, value) pairs
+
+This callback processes a block and returns a set of mappings
+to be placed into the index. The Storm pool, in turn, provides 
+the following interface to the application::
+
+    get(key) -> set of (block, value) pairs
+
+This function finds all mappings created by this application
+with a given key, indicating both the application-provided
+value and the block for which the mapping was created.
+
+We use the ``getMappings()`` approach instead of
+allowing applications to put arbitrary mappings into the database
+because binding mappings to blocks makes it easy for pools
+to e.g. remove associated mappings when deleting a block.
+
+As an example, the ``getMappings()`` method of our Xanalogical 
+storage implementation will, for a block containing a document,
+collect all the spans in a document, and return mappings
+from their scroll blocks' ids to the spans and their positions
+in the document. When we want to find the transclusions of
+a span, we ``get()`` the mappings for the id of that span's
+scroll block, and load the document blocks referenced
+by the mappings.
+
+In a networked implementation, each peer is responsible
+for indexing the blocks it stores. Since no peer can
+feasibly know all applications that may want indexing,
+there may be blocks available on the network that have
+not been indexed by a particular application.
+We do not see this as a problem-- it's just like a block
+being available only if there's a peer which wants it to be--
+but applications must be prepared to deal with it.
+
+Locally, on the other hand, it is guaranteed that
+all blocks in a pool are indexed by all applications
+known by the pool, even if the block was added 
+to the pool in a previous session in which the pool
+did not know about the indexing application. To ensure this,
+we maintain for each application a table of blocks
+that have already been indexed. When the Storm pool
+implementation is initialized, it compares the list
+of indexed blocks with the list of all available blocks,
+and asks the application for unindexed blocks' mappings.
+
+One indexing application that may seem obvious is keyword-based
+full-text search. However, no research has been done
+in this direction; it is not clear whether the current
+interface is well suited to this, or whether current implementations
+are able to scale to the load to store a mapping for each word
+occuring in a document.
 
 
 6. Versioning
@@ -384,7 +450,29 @@
 A pointer is a globally unique identifier that can refer to
 different blocks over time.
 
-To assign targets to pointers, a special type of block
+To assign targets to pointers, a special type of block is used.
+If we have a block B and a pointer P (P is simply a random id),
+then we create an additional block whose mere presence tells Storm
+that pointer P points to block B. This is one application of Storm
+indexing (Section 5): Keeping an index of all pointer blocks
+about pointer ``P``, we can quickly find out the current one.
+
+UML to be drawn::
+
+    +---------+ 1      * +---------------+
+    | Pointer |----------| Pointer block |
+    +---------+          +---------------+
+                               * |
+                                 | 
+                                 |
+                               1 | target
+                            +--------+
+                            | Target |
+                            +--------+
+
+In addition to the pointer and the target, pointer blocks contain
+a list of *obsoleted* pointer blocks. These are pointer blocks
+associated with the same pointer which are 
 
 ...
 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]