gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] manuscripts/storm article.rst


From: Benja Fallenstein
Subject: [Gzz-commits] manuscripts/storm article.rst
Date: Sat, 15 Feb 2003 08:29:04 -0500

CVSROOT:        /cvsroot/gzz
Module name:    manuscripts
Changes by:     Benja Fallenstein <address@hidden>      03/02/15 08:29:04

Modified files:
        storm          : article.rst 

Log message:
        start making the section 3 <-> section 4 swap work text-wise

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/manuscripts/storm/article.rst.diff?tr1=1.154&tr2=1.155&r1=text&r2=text

Patches:
Index: manuscripts/storm/article.rst
diff -u manuscripts/storm/article.rst:1.154 manuscripts/storm/article.rst:1.155
--- manuscripts/storm/article.rst:1.154 Sat Feb 15 06:54:00 2003
+++ manuscripts/storm/article.rst       Sat Feb 15 08:29:03 2003
@@ -127,7 +127,7 @@
 We address the mobility of documents by block storage
 and versioning, while we use Xanalogical storage
 to address the movement of content between documents (copy&paste);
-see Fig. [ref-storm_layers]_.
+Fig. [ref-storm_layers]_ provides an overview of Storm's components.
 
 .. uml:: storm_layers
     :caption: Components of the Storm model
@@ -172,7 +172,7 @@
 usable yet.
 
 This paper is structured as follows. In the next section, we describe 
-related work. In section 3, we give an overview of xanalogical model.
+related work. In section 3, we give an overview of the xanalogical storage 
model.
 In section 4, we introduce the basic storage unit of our 
 system, i.e. file-like blocks identified by cryptographic hashes. In section 
5, 
 we discuss application-specific reverse indexing of blocks by their 
@@ -326,34 +326,6 @@
 at the cost of each peer maintaining one node in the overlay network
 for each (key,value) pair it publishes.
 
-.. hemppah's original text before benja's changes:
-   In a DHT, both hashtable items and the addresses of peers 
-   are mapped into a single virtual key space. The form of the key space 
-   depends on implementation (for example, Chord uses a circle).
-   A distance metric (e.g. numerical, XOR) is used to find the peer
-   whose position in the key space is 'closest' to the key of a given item.
-   This peer is responsible to store the item (so both queries and insertions
-   relating to the key are routed to it.) Thus, 
-   DHT's overlay connectivity graph is structured. On the other hand, the 
overlay 
-   connectivity graph of broadcasting approach is formed more or less (depends 
on 
-   implementation) in a random manner. 
-
-   When performing queries, in broadcasting approach, peer sends a query 
request to a 
-   subset of its neighbors and these peers to their subsequent neighbors. The 
-   process will continue as long as query's time-to-live (TTL) value hasn't 
been reached. 
-   In DHT approach, query request is deterministically routed towards the peer 
-   which hosts a specific data item. Routing is based on 'hints' (based on 
-   differences between data item's key and peer's key), which each peer 
provides 
-   along the routing path.
-
-   Obviously, there are major differences within approaches. For the DHT 
approach, 
-   perhaps the main difference is *what* is self-organized into a
-   virtual key space. For instance, in SWAN [ref] and Skip Graph [ref], *data 
-   items* self-organise into a virtual address space, while in other DHT 
-   implementations *peers* self-organise in structured form in a virtual 
space. 
-   In the broadcasting approach, implementations' differences mostly lie in 
the 
-   *structural level* of the overlay network, i.e. super peers and peer 
clusters.
-
 The basic definition of a distributed hashtable does not indicate
 how large the keys and values used may be. Intuitively, we expect keys
 to be small, maybe a few hundred bytes at most; however, there are different
@@ -364,12 +336,6 @@
 a *home-store* and the latter a *directory* scheme (they call the peer
 responsible for a hashtable item its 'home node,' thus 'home-store').
 
-.. Should we discuss applications of p2p systems (CFS, OceanStore, Squirrel, 
...)
-   here? If so, which ones?
-   [CFS/PAST(DHT, GUID, block vs. files), perhaps Freenet too (GUID) ? 
-Hermanni]
-
-.. thesis-benja: remove paragraph below
-
 CFS [ref] is a global peer-to-peer storage system. CFS is built upon Chord DHT 
 peer-to-peer routing layer[ref]. CFS stores data as blocks. However, CFS 
*splits* data 
 (files) into several miniblocks and spreads blocks over the available CFS 
servers.
@@ -427,63 +393,37 @@
 Through this mechanism, the system can show to the user all documents
 that share text with the current document.
 
-To keep track of links and transclusions, the system keeps a global index
-of documents by the characters they contain, and of links by the characters
-they refer to. Thus, for each character in the document, the system
-queries the index for other documents containing this character,
-and shows them as transclusions. Resolving links is a multi-step process.
-Each link is modeled as two collections of characters: the two
-endpoints of the link. To show links to a specific document,
-the system firstly uses the link index to find links
-to each character in the document. Secondly, for each link,
-it looks at the *other* set of characters in the link-- the target
-of the link, if the original character was the source, and vice versa.
-Thirdly, it looks for documents containing these target characters.
-This way, even if both the source and target characters of the link 
-are moved to a different document, the link stays connected to them.
+To track links and transclusions, the system indexes documents by
+the characters they contain, and links by the characters they refer to.
+To find transclusions of a document, we search the index for other
+documents containing any character from this document. To show links,
+we first search for links refering to any character in this document.
+However, when we have a link, we don't yet have the document it links to.
+Therefore, in a second step, we search for documents containing
+the characters the link targets.
 
-Of course, doing any expensive operation for *every* character 
+Of course, doing any expensive operation 
+(like an index lookup) for *every* character 
 in a document does not scale very well. In practice,
 characters typed in consecutively are given consecutive ids,
 such as ``...:4``, ``...:5``, ``...:6`` and so on, and
-operations are on *spans*, i.e. consecutive ranges of characters
-(``...:4-6``) in a document. In Storm, in each editor session we 
-create a block with all characters entered in this session (the content type
-being ``text/plain``). To designate a span of characters
-from that session, we use the block's id, the offset of the first
-character, and the number of characters in the span.
-This technique was first introduced in [lukka02guids]_.
-
-In Xanadu, characters are stored to append-only *scrolls*
-when they are typed [ref]. Because of this, in Storm, we call the 
-blocks containing the actual characters *scroll blocks*. The documents
-do not actually contain the characters; instead, they are
-*virtual files* containing span references as described above.
-To show a document, the scroll blocks it references are loaded
-and the characters retrieved from there [#]_.
+operations are on *spans*, i.e. ranges of consecutive characters
+(``...:4-6``).
 
-.. [#] It is unclear whether this approach is efficient for text
-   in the Storm framework; in the future, we may try storing
-   the characters in the documents themselves, along with their
-   permanent identifiers. For images or video, on the other hand,
-   it is clearly beneficial if content appearing in different
-   documents-- or different versions of a document-- is only
-   stored once, in a block only referred to wherever
-   the data is transcluded.
-   
 Our current implementation shows only links between documents
 that are in memory at the same time [screenshot of xupdf, perhaps ref too
 (submitted) antont: was thinking the same, it would illustrate this well].
-In the future, we will implement a global distributed index at top of
-a distributed hashtable, with the scroll blocks' ids as the keys.
+In the future, we will implement a global distributed index on top of
+a distributed hashtable (Section 5).
 To find the transclusions of a span, the system will retrieve
-all transclusions of any span in the scroll block, then
+all transclusions of any span with the same prefix (``...:``), then
 sort out those that do not overlap the span in question.
 
 Since the problem is to search for overlapping ranges,
-the spans cannot be used as hashtable keys. However, as the blocks
-will be relatively small (limited by the amount of text
-the user enters between two saves of a document), we hope
+the spans themselves can not be used as hashtable keys. 
+However, we keep the number of spans with the same prefix
+relatively small (limited by the amount of text
+the user enters between two saves of a document). Therefore, we hope
 that this will not be a major scalability problem. Otherwise,
 systems that allow range queries, such as skip graphs [AspnesS2003]_, 
 skipnet [ref], may prove useful.
@@ -502,7 +442,7 @@
 Figure [ref-figdocmovement]_ illustrates how xanalogical storage addresses the 
issue of
 movement of data between documents. Initially, there are documents D1 and
 D2, with two links (directed arrows in the figure) from D1 to two different
-elements in D2, A and B. The links actually are to the /spans/ A and B that
+elements in D2, A and B. The links actually are to the *spans* A and B that
 are stored in the scroll, but shown as parts of D2, as illustrated with the
 dashed lines. Then, when in the next step the document D2 is split in two --
 becoming documents D2.1 and D2.2 -- with link target A in the first and B
@@ -750,6 +690,33 @@
 version of a document whose identifier is hard-wired into
 the software (mutable documents are described in section 6.1).
 
+4.2. Xanalogical storage on top of blocks
+-----------------------------------------
+
+In Storm, in each editor session we 
+create a block with all characters entered in this session (the content type
+being ``text/plain``). To designate a span of characters
+from that session, we use the block's id, the offset of the first
+character, and the number of characters in the span.
+This technique was first introduced in [lukka02guids]_.
+
+In Xanadu, characters are stored to append-only *scrolls*
+when they are typed [ref?]. Because of this, in Storm, we call the 
+blocks containing the actual characters *scroll blocks*. The documents
+do not actually contain the characters; instead, they are
+*virtual files* containing span references as described above.
+To show a document, the scroll blocks it references are loaded
+and the characters retrieved from there [#]_.
+
+.. [#] It is unclear whether this approach is efficient for text
+   in the Storm framework; in the future, we may try storing
+   the characters in the documents themselves, along with their
+   permanent identifiers. For images or video, on the other hand,
+   it is clearly beneficial if content appearing in different
+   documents-- or different versions of a document-- is only
+   stored once, in a block only referred to wherever
+   the data is transcluded.
+
 
 5. Application-specific reverse indexing
 ========================================
@@ -839,25 +806,10 @@
 Clearly, for block storage to be useful, there has to be a way to
 efficiently update documents/maintain different versions of documents. 
 We achieve this by a combination of two mechanisms. Firstly, a 
-*pointer* is an updatable reference to a block;
-pointers can be updated by creating a specific kind of Storm block
-representing an assertion of the form, "pointer ``P`` now points
-to block ``B``." Pointers are resolved with the help of a Storm index 
-mapping pointer identifiers to blocks providing targets for that pointer.
-Through this mechanism, we can keep old versions of documents
-along with the current versions.
-
-.. [Figure ? -Hermanni]
-
-Secondly, in the spirit of version control systems like CVS,
-we do not store *each version*, but only the differences between versions.
-However, we still refer to each full version by the id of a block
-containing that version, even though we do not store this block.
-When we want to access a particular version, we reconstruct it
-using the differences, and then check the result using
-the cryptographic hash in the full version's block id.
+*pointer* is an updatable reference to a block.
+Secondly, similar to version control systems like CVS,
+we do not store each version, but only the differences between versions.
 
-.. [Figure ? -Hermanni]
 
 6.1. Pointers: implementing mutable resources
 ---------------------------------------------
@@ -865,7 +817,7 @@
 In Storm, *pointers* are used to implement mutable resources.
 A pointer is a globally unique identifier (usually created randomly)
 that can refer to different blocks over time. A block a pointer
-points to is called the pointer's *target*.
+points to is called the pointer's *target* (Fig. [ref-storm_pointers]_).
 
 To assign a target to a pointer, we create a special kind of block,
 a *pointer block*, representing an assertion like *pointer P targets
@@ -931,7 +883,7 @@
 for off-line as well as on-line work.
 For long-term publishing, one-time signatures have been
 found useful [anderson98erl]_. For the time being, the pointer mechanism
-works only in trusted Storm zones (Section 3), e.g.
+works only in trusted Storm zones (Section 4), e.g.
 in a workgroup collaborating on a set of documents.
 
 .. [#] Without timestamps, digital signatures are only valid
@@ -967,13 +919,8 @@
 6.2. Diffs: storing alternative versions efficiently
 ----------------------------------------------------
 
-[benja says: Please do not touch this section, but tell me
-how to improve it instead. Reason: this is the meat for my thesis,
-due Feb 9th, so I want all possible improvements on it
-to go there, too ;-) [and I'm of course allowed to solicite feedback,
-but not allowed to use stuff written by someone else...]]
-[Hm, should we move/remove 'Additionally, many versioning'
-paragraph into related work ? -Hermanni]
+.. [Hm, should we move/remove 'Additionally, many versioning'
+   paragraph into related work ? -Hermanni]
 
 The pointer system suggests that for each version of a document,
 we store an independent block containing this version. This




reply via email to

[Prev in Thread] Current Thread [Next in Thread]