gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] manuscripts/storm article.rst


From: Hermanni Hyytiälä
Subject: [Gzz-commits] manuscripts/storm article.rst
Date: Wed, 12 Feb 2003 08:24:41 -0500

CVSROOT:        /cvsroot/gzz
Module name:    manuscripts
Changes by:     Hermanni Hyytiälä <address@hidden>      03/02/12 08:24:41

Modified files:
        storm          : article.rst 

Log message:
        Few radical reorgs for more logical roadmap

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/manuscripts/storm/article.rst.diff?tr1=1.133&tr2=1.134&r1=text&r2=text

Patches:
Index: manuscripts/storm/article.rst
diff -u manuscripts/storm/article.rst:1.133 manuscripts/storm/article.rst:1.134
--- manuscripts/storm/article.rst:1.133 Wed Feb 12 07:39:34 2003
+++ manuscripts/storm/article.rst       Wed Feb 12 08:24:41 2003
@@ -148,14 +148,13 @@
 usable yet.
 
 This paper is structured as follows. In the next section, we describe 
-related work. In section 3, we introduce the basic storage unit of our 
-system, file-like blocks identified by cryptographic hashes. 
-In section 4, we discuss our implementation of Xanalogical storage 
-on top of the block system. In section 5, we discuss application-specific 
-reverse indexing of blocks by their content, essential for many applications.
-In section 6, we present techiques for efficient versioned storage
-of mutable data on top of blocks. In section 7, 
-we report on implementation experience and future directions. 
+related work. In section 3, we give an overview of xanalogical model.
+In section 4, we introduce the basic storage unit of our 
+system, i.e. file-like blocks identified by cryptographic hashes. In section 
5, 
+we discuss application-specific reverse indexing of blocks by their 
+content, essential for many applications. In section 6, we present 
+techiques for efficient versioned storage of mutable data on top of blocks. 
+In section 7, we report on implementation experience and future directions. 
 Section 8 concludes the paper.
 
 .. uml:: storm_layers
@@ -419,9 +418,77 @@
 which are permanently on-line, but act as ordinary peers
 in the indexing overlay network.
    
+   
+3. Overview of Xanalogical storage
+==================================
+
+In the xanalogical storage model [ref], 
+pioneered by the unfinished Project Xanadu [ref],
+links are not between documents, but individual characters.
+When a character is first typed in, it acquires a permanent id
+("the character 'D' typed by Janne Kujala on 10/8/97 8:37:18"),
+which it retains when copied to a different document, distinguishing
+it from all similar characters typed in independently [#]_.
+A link is shown between any two documents containing the characters
+that the link connects. Xanalogical links are external and bidirectional.
+
+.. [#] Xanalogical storage is not limited to text. We speak about
+   *characters* because it simplifies the explanation; picture's pixels
+   or frames of video could be substituted.
 
-3. Block storage
-================
+In addition to content links, xanalogical storage keeps an index of
+transclusions: identical characters copied into different documents.
+Through this mechanism, the system can show to the user all documents
+that share text with the current document.
+
+To keep track of links and transclusions, the system keeps a global index
+of documents by the characters they contain, and of links by the characters
+they refer to. Thus, for each character in the document, the system
+queries the index for other documents containing this character,
+and shows them as transclusions. Resolving links is a multi-step process.
+Each link is modeled as two collections of characters: the two
+endpoints of the link. To show links to a specific document,
+the system firstly uses the link index to find links
+to each character in the document. Secondly, for each link,
+it looks at the *other* set of characters in the link-- the target
+of the link, if the original character was the source, and vice versa.
+Thirdly, it looks for documents containing these target characters.
+This way, even if both the source and target characters of the link 
+are moved to a different document, the link stays connected to them.
+
+Of course, doing any expensive operation for *every* character 
+in a document does not scale very well. In practice,
+characters typed in consecutively are given consecutive ids,
+such as ``...:4``, ``...:5``, ``...:6`` and so on, and
+operations are on *spans*, i.e. consecutive ranges of characters
+(``...:4-6``) in a document. In Storm, in each editor session we 
+create a block with all characters entered in this session (the content type
+being ``text/plain``). To designate a span of characters
+from that session, we use the block's id, the offset of the first
+character, and the number of characters in the span.
+This technique was first introduced in [ref ht02 paper].
+
+In Xanadu, characters are stored to append-only *scrolls*
+when they are typed [ref]. Because of this, in Storm, we call the 
+blocks containing the actual characters *scroll blocks*. The documents
+do not actually contain the characters; instead, they are
+*virtual files* containing span references as described above.
+To show a document, the scroll blocks it references are loaded
+and the characters retrieved from there [#]_.
+
+.. [#] It is unclear whether this approach is efficient for text
+   in the Storm framework; in the future, we may try storing
+   the characters in the documents themselves, along with their
+   permanent identifiers. For images or video, on the other hand,
+   it is clearly beneficial if content appearing in different
+   documents-- or different versions of a document-- is only
+   stored once, in a block only referred to wherever
+   the data is transcluded.
+
+
+
+4. Storm block storage
+======================
 
 In Storm, all data is stored
 as *blocks*, byte sequences identified by a SHA-1 
@@ -571,7 +638,7 @@
 to overcome the limitations of traditional file-based applications.
 
 
-3.1. Implementation
+4.1. Implementation
 -------------------
 
 Storm blocks are MIME messages [ref MIME], i.e., objects with
@@ -641,114 +708,6 @@
 the software (mutable documents are described in section 6.1).
 
 
-4. Xanalogical storage
-======================
-
-In the xanalogical storage model [ref], 
-pioneered by the unfinished Project Xanadu [ref],
-links are not between documents, but individual characters.
-When a character is first typed in, it acquires a permanent id
-("the character 'D' typed by Janne Kujala on 10/8/97 8:37:18"),
-which it retains when copied to a different document, distinguishing
-it from all similar characters typed in independently [#]_.
-A link is shown between any two documents containing the characters
-that the link connects. Xanalogical links are external and bidirectional.
-
-.. [#] Xanalogical storage is not limited to text. We speak about
-   *characters* because it simplifies the explanation; picture's pixels
-   or frames of video could be substituted.
-
-In addition to content links, xanalogical storage keeps an index of
-transclusions: identical characters copied into different documents.
-Through this mechanism, the system can show to the user all documents
-that share text with the current document.
-
-To keep track of links and transclusions, the system keeps a global index
-of documents by the characters they contain, and of links by the characters
-they refer to. Thus, for each character in the document, the system
-queries the index for other documents containing this character,
-and shows them as transclusions. Resolving links is a multi-step process.
-Each link is modeled as two collections of characters: the two
-endpoints of the link. To show links to a specific document,
-the system firstly uses the link index to find links
-to each character in the document. Secondly, for each link,
-it looks at the *other* set of characters in the link-- the target
-of the link, if the original character was the source, and vice versa.
-Thirdly, it looks for documents containing these target characters.
-This way, even if both the source and target characters of the link 
-are moved to a different document, the link stays connected to them.
-
-Of course, doing any expensive operation for *every* character 
-in a document does not scale very well. In practice,
-characters typed in consecutively are given consecutive ids,
-such as ``...:4``, ``...:5``, ``...:6`` and so on, and
-operations are on *spans*, i.e. consecutive ranges of characters
-(``...:4-6``) in a document. In Storm, in each editor session we 
-create a block with all characters entered in this session (the content type
-being ``text/plain``). To designate a span of characters
-from that session, we use the block's id, the offset of the first
-character, and the number of characters in the span.
-This technique was first introduced in [ref ht02 paper].
-
-In Xanadu, characters are stored to append-only *scrolls*
-when they are typed [ref]. Because of this, in Storm, we call the 
-blocks containing the actual characters *scroll blocks*. The documents
-do not actually contain the characters; instead, they are
-*virtual files* containing span references as described above.
-To show a document, the scroll blocks it references are loaded
-and the characters retrieved from there [#]_.
-
-.. [#] It is unclear whether this approach is efficient for text
-   in the Storm framework; in the future, we may try storing
-   the characters in the documents themselves, along with their
-   permanent identifiers. For images or video, on the other hand,
-   it is clearly beneficial if content appearing in different
-   documents-- or different versions of a document-- is only
-   stored once, in a block only referred to wherever
-   the data is transcluded.
-
-Our current implementation shows only links between documents
-that are in memory at the same time [screenshot of xupdf, perhaps ref too
-(submitted) antont: was thinking the same, it would illustrate this well].
-In the future, we will implement a global distributed index at top of
-a distributed hashtable, with the scroll blocks' ids as the keys.
-To find the transclusions of a span, the system will retrieve
-all transclusions of any span in the scroll block, then
-sort out those that do not overlap the span in question.
-
-Since the problem is to search for overlapping ranges,
-the spans cannot be used as hashtable keys. However, as the blocks
-will be relatively small (limited by the amount of text
-the user enters between two saves of a document), we hope
-that this will not be a major scalability problem. Otherwise,
-systems that allow range queries, such as skip graphs [ref] 
-and skipnet [ref], may prove useful.
-
-.. [how does video work here, i.e. are they huge blocks or collections of many
-   small, frames? or a sequence between two keyframes?]
-
-   Benja says: Probably large, but the number of links to them 
-   still won't be that big, I guess. Not sure how to discuss this
-   in the text; feel free to propose something :-)
-
-   Toni: will try.. wondering also about who to consult :o
-
-.. [This might be relevant also:
-   http://www.hpl.hp.com/techreports/2002/HPL-2002-209.pdf
-   -Hermanni]
-
-One question raised by xanalogical storage is which links to show
-for a popular document that has been linked to by many users.
-We hope to address this problem by collaborative filtering
-of links [explain, ref (grouplens.org pubs?)]. There has been research on
-collaborative filtering in peer-to-peer systems
-without compromising participants' privacy [ref John Canny].
-For some purposes simple rules based on e.g. belonging to a group may be
-applicable as well: e.g. when working on a project with a project group, it
-may be beneficial for the members of the group to see other members'
-comments of articles etc.
-
-
 5. Application-specific reverse indexing
 ========================================
 
@@ -1166,6 +1125,49 @@
    Comments may be new entities(?) linking to it   
 
 
+
+Our current implementation shows only links between documents
+that are in memory at the same time [screenshot of xupdf, perhaps ref too
+(submitted) antont: was thinking the same, it would illustrate this well].
+In the future, we will implement a global distributed index at top of
+a distributed hashtable, with the scroll blocks' ids as the keys.
+To find the transclusions of a span, the system will retrieve
+all transclusions of any span in the scroll block, then
+sort out those that do not overlap the span in question.
+
+Since the problem is to search for overlapping ranges,
+the spans cannot be used as hashtable keys. However, as the blocks
+will be relatively small (limited by the amount of text
+the user enters between two saves of a document), we hope
+that this will not be a major scalability problem. Otherwise,
+systems that allow range queries, such as skip graphs [ref] 
+and skipnet [ref], may prove useful.
+
+.. [how does video work here, i.e. are they huge blocks or collections of many
+   small, frames? or a sequence between two keyframes?]
+
+   Benja says: Probably large, but the number of links to them 
+   still won't be that big, I guess. Not sure how to discuss this
+   in the text; feel free to propose something :-)
+
+   Toni: will try.. wondering also about who to consult :o
+
+.. [This might be relevant also:
+   http://www.hpl.hp.com/techreports/2002/HPL-2002-209.pdf
+   -Hermanni]
+
+One question raised by xanalogical storage is which links to show
+for a popular document that has been linked to by many users.
+We hope to address this problem by collaborative filtering
+of links [explain, ref (grouplens.org pubs?)]. There has been research on
+collaborative filtering in peer-to-peer systems
+without compromising participants' privacy [ref John Canny].
+For some purposes simple rules based on e.g. belonging to a group may be
+applicable as well: e.g. when working on a project with a project group, it
+may be beneficial for the members of the group to see other members'
+comments of articles etc.
+   
+   
 Open issue: Tree hashing, also possibly needed for (OceanStore-like?)
 distribution of shares
 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]