gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] manuscripts/storm article.rst


From: Benja Fallenstein
Subject: [Gzz-commits] manuscripts/storm article.rst
Date: Sun, 02 Feb 2003 22:33:49 -0500

CVSROOT:        /cvsroot/gzz
Module name:    manuscripts
Changes by:     Benja Fallenstein <address@hidden>      03/02/02 22:33:49

Modified files:
        storm          : article.rst 

Log message:
        Xanalogical storage explained ;-)

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/manuscripts/storm/article.rst.diff?tr1=1.71&tr2=1.72&r1=text&r2=text

Patches:
Index: manuscripts/storm/article.rst
diff -u manuscripts/storm/article.rst:1.71 manuscripts/storm/article.rst:1.72
--- manuscripts/storm/article.rst:1.71  Sat Feb  1 22:45:37 2003
+++ manuscripts/storm/article.rst       Sun Feb  2 22:33:49 2003
@@ -1,16 +1,6 @@
-============================================================================
-Gzz Storm: Supporting data mobility through location independent identifiers
-============================================================================
-
-(an-other way (too buzzwordy? but generalized!): enabling distributed
-mobile hypermedia with location independent unique document identifiers)
-['distributed mobile hypermedia' is too limited --b.]
-[Perhaps 'Storm: Supporting data mobility through location independent 
-identifiers' is enough ? We could mention in the text (as we do mention ;), 
-that Storm is used also in our Gzz project. Do we really bind name 'Storm' to 
name 
-'Gzz' in the main title ? This may have psychological effects: reader 
-might first think that Storm can only be used with Gzz. And this is not 
-true. -Hermanni]
+========================================================================
+Storm: Supporting data mobility through location independent identifiers
+========================================================================
 
 1. Introduction
 ===============
@@ -63,6 +53,16 @@
 systems (location independent identifiers, immutable block storage, *working* 
links etc.)
 -use of p2p architecture in hypermedia domain 
 
+Gzz provides a platform to build hypermedia applications upon.
+So far, we have only used Storm in our experimental
+hypermedia system, Gzz. No work on integrating Storm
+with current programs (in the spirit of Open Hypermedia)
+has been done so far. It is not clear how far this is possible
+without changing applications substantially, if advantage
+of our implementation of Xanalogical storage is to be taken.
+(Vitali [ref] notes that Xanalogical storage necessiates
+strong discipline in version tracking, which current systems lack.)
+
 This paper is structured as follows. In next section, we describe 
 related work. In section 3, we introduce the basic storage unit of our 
 system, file-like blocks of data identified by cryptographic hashes. 
@@ -344,7 +344,91 @@
 4. Xanalogical storage
 ======================
 
-Xanalogical storage, pioneered by Project Xanadu [ref],
+In the xanalogical storage model [ref], 
+pioneered by the unfinished Project Xanadu [ref],
+links are not between documents, but individual characters.
+When a character is first typed in, it acquires a permanent id
+("the character 'D' typed by Janne Kujala on 10/8/97 8:37:18"),
+which it retains when copied to a different document, distinguishing
+it from all similar characters typed in independently [#]_.
+A link is shown between any two documents containing the characters
+that the link connects. Xanalogical links are external and bidirectional.
+
+.. [#] Xanalogical storage is not limited to text. We speak about
+   *characters* because it simplifies the explanation; pixels
+   or frames of video could be substituted.
+
+In addition to content links, xanalogical storage keeps an index of
+transclusions: identical characters copied into different documents.
+Through this mechanism, the system can show to the user all documents
+that share text with the current document.
+
+To keep track of links and transclusions, the system keeps a global index
+of documents by the characters they contain, and of links by the characters
+they refer to. Thus, for each character in the document, the system
+queries the index for other documents containing this character,
+and shows them as transclusions. Resolving links is a multi-step process.
+Each link is modeled as two collections of characters: the two
+endpoints of the link. To show links to a document,
+the system firstly uses the link index to find links
+to each character in the documment. Secondly, for each link,
+it looks at the *other* set of characters in the link-- the target
+of the link, if the original character was the source, and vice versa.
+Thirdly, it looks for documents containing these target characters.
+This way, even if both the source and target cjaracters of the link 
+are moved to a different document, the link stays connected to them.
+
+Of course, doing any expensive operation for *every* character 
+in a document does not scale very well. In practice,
+characters typed in consecutively are given consecutive ids,
+such as ``...:4``, ``...:5``, ``...:6`` and so on, and
+operations are on *spans*, consecutive ranges of characters
+(``...:4-6``). In Storm, in each editor session we create a
+block with all characters entered in this session (the content type
+being ``text/plain``). To designate a span of characters
+from that session, we use the block's id, the offset of the first
+character, and the number of characters in the span.
+This technique was first introduced in [ref ht02 paper].
+
+In Xanadu, characters are written to append-only *scrolls*
+when they are typed [ref]. Because of this, we call the blocks
+containing the actual characters *scroll blocks*. The documents
+do not actually contain the characters; instead, they are
+*virtual files* containing span references as described above.
+To show a document, the scroll blocks it references are loaded
+and the characters retrieved from there [#]_.
+
+.. [#] It is unclear whether this approach is efficient for text
+   in the Storm framework; in the future, we may try storing
+   the characters in the documents themselves, along with their
+   permanent identifiers. For images or video, on the other hand,
+   it is clearly beneficial if content appearing in different
+   documents-- or different versions of a document-- is only
+   stored once, in a block only referred to wherever
+   the data is transcluded.
+
+Our current implementation shows only links between documents
+that are in memory at the same time [screenshot of xupdf].
+In the future, we will implement a global index atop of
+a distributed hashtable, with the scroll blocks' ids as the keys.
+To find the transclusions of a span, the system will retrieve
+all transclusions of any span in the scroll block, then
+sort out those that do not overlap the span in question.
+
+Since the problem is to search for overlapping ranges,
+the spans cannot be used as hashtable keys. However, as the blocks
+will be relatively small (limited by the amount of text
+the user enters between two saves of a document), we hope
+that this will not be a major scalability problem. Otherwise,
+systems that allow range queries, such as skip graphs [ref],
+may prove useful.
+
+One question raised by xanalogical storage is which links to show
+for a popular document that has been linked to by many users.
+We hope to address this problem by collaborative filtering
+of links [explain, ref]. There has been research on
+collaborative filtering in peer-to-peer systems
+without compromising participants' privacy [ref John Canny].
 
 
 5. Indexing




reply via email to

[Prev in Thread] Current Thread [Next in Thread]