[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gzz-commits] manuscripts/storm article.rst
From: |
Benja Fallenstein |
Subject: |
[Gzz-commits] manuscripts/storm article.rst |
Date: |
Sat, 01 Feb 2003 17:24:52 -0500 |
CVSROOT: /cvsroot/gzz
Module name: manuscripts
Changes by: Benja Fallenstein <address@hidden> 03/02/01 17:24:52
Modified files:
storm : article.rst
Log message:
Section 5, 'Indexing'
CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/manuscripts/storm/article.rst.diff?tr1=1.68&tr2=1.69&r1=text&r2=text
Patches:
Index: manuscripts/storm/article.rst
diff -u manuscripts/storm/article.rst:1.68 manuscripts/storm/article.rst:1.69
--- manuscripts/storm/article.rst:1.68 Fri Jan 31 09:46:46 2003
+++ manuscripts/storm/article.rst Sat Feb 1 17:24:52 2003
@@ -350,7 +350,73 @@
5. Indexing
===========
-XXX
+Finding links to and transclusions of a piece of content in
+Xanalogical storage is but one example of *reverse indexing*
+of Storm blocks: finding a block based on its contents.
+(For other examples, see section 6, below.)
+Storm provides a general API for indexing blocks in
+application-specific ways. We have implemented indexing
+on a local machine, but the interface is designed so that
+implementation on top of a distributed hashtable
+will be trivial.
+
+In Storm, applications are not allowed to put arbitrary
+mappings into the index. Instead, applications that want
+to index blocks provide the following callback
+to a Storm pool::
+
+ getMappings(block) -> set of (key, value) pairs
+
+This callback processes a block and returns a set of mappings
+to be placed into the index. The Storm pool, in turn, provides
+the following interface to the application::
+
+ get(key) -> set of (block, value) pairs
+
+This function finds all mappings created by this application
+with a given key, indicating both the application-provided
+value and the block for which the mapping was created.
+
+We use the ``getMappings()`` approach instead of
+allowing applications to put arbitrary mappings into the database
+because binding mappings to blocks makes it easy for pools
+to e.g. remove associated mappings when deleting a block.
+
+As an example, the ``getMappings()`` method of our Xanalogical
+storage implementation will, for a block containing a document,
+collect all the spans in a document, and return mappings
+from their scroll blocks' ids to the spans and their positions
+in the document. When we want to find the transclusions of
+a span, we ``get()`` the mappings for the id of that span's
+scroll block, and load the document blocks referenced
+by the mappings.
+
+In a networked implementation, each peer is responsible
+for indexing the blocks it stores. Since no peer can
+feasibly know all applications that may want indexing,
+there may be blocks available on the network that have
+not been indexed by a particular application.
+We do not see this as a problem-- it's just like a block
+being available only if there's a peer which wants it to be--
+but applications must be prepared to deal with it.
+
+Locally, on the other hand, it is guaranteed that
+all blocks in a pool are indexed by all applications
+known by the pool, even if the block was added
+to the pool in a previous session in which the pool
+did not know about the indexing application. To ensure this,
+we maintain for each application a table of blocks
+that have already been indexed. When the Storm pool
+implementation is initialized, it compares the list
+of indexed blocks with the list of all available blocks,
+and asks the application for unindexed blocks' mappings.
+
+One indexing application that may seem obvious is keyword-based
+full-text search. However, no research has been done
+in this direction; it is not clear whether the current
+interface is well suited to this, or whether current implementations
+are able to scale to the load to store a mapping for each word
+occuring in a document.
6. Versioning
@@ -384,7 +450,29 @@
A pointer is a globally unique identifier that can refer to
different blocks over time.
-To assign targets to pointers, a special type of block
+To assign targets to pointers, a special type of block is used.
+If we have a block B and a pointer P (P is simply a random id),
+then we create an additional block whose mere presence tells Storm
+that pointer P points to block B. This is one application of Storm
+indexing (Section 5): Keeping an index of all pointer blocks
+about pointer ``P``, we can quickly find out the current one.
+
+UML to be drawn::
+
+ +---------+ 1 * +---------------+
+ | Pointer |----------| Pointer block |
+ +---------+ +---------------+
+ * |
+ |
+ |
+ 1 | target
+ +--------+
+ | Target |
+ +--------+
+
+In addition to the pointer and the target, pointer blocks contain
+a list of *obsoleted* pointer blocks. These are pointer blocks
+associated with the same pointer which are
...
[Gzz-commits] manuscripts/storm article.rst,
Benja Fallenstein <=
[Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/02/01
[Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/01
[Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/02
[Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/02
[Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/02/03
[Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/02/03
[Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/02/03
[Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/02/03
[Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/02/03