gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] manuscripts/storm article.rst


From: Benja Fallenstein
Subject: [Gzz-commits] manuscripts/storm article.rst
Date: Fri, 07 Feb 2003 18:27:22 -0500

CVSROOT:        /cvsroot/gzz
Module name:    manuscripts
Changes by:     Benja Fallenstein <address@hidden>      03/02/07 18:27:22

Modified files:
        storm          : article.rst 

Log message:
        More article work

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/manuscripts/storm/article.rst.diff?tr1=1.108&tr2=1.109&r1=text&r2=text

Patches:
Index: manuscripts/storm/article.rst
diff -u manuscripts/storm/article.rst:1.108 manuscripts/storm/article.rst:1.109
--- manuscripts/storm/article.rst:1.108 Fri Feb  7 10:09:27 2003
+++ manuscripts/storm/article.rst       Fri Feb  7 18:27:22 2003
@@ -60,7 +60,12 @@
 Dangling links and keeping track of alternative versions. 
 Resolvable location independent identifiers
 make these issues much easier to deal with, since data
-can be recognized whereever it is moved. 
+can be recognized whereever it is moved [#]_. 
+
+.. [#] It might be more appropriate to speak about *resources*
+   and *references* instead of *documents* and *links*, but
+   in the spirit of [ref kappe95scalable], we stick with
+   the latter terms for explanation purposes.
 
 *Dangling links* are an issue when documents are moved
 between servers; when no network connection is available,
@@ -76,7 +81,7 @@
 to documents stored on either one's computer [ref Thompson et al].
 
 Advanced hypermedia systems such as Microcosm and Hyper-G
-address dangling links through a notification system:
+address dangling links through a notification system [ref]:
 When a document is moved, servers storing links to it are notified.
 Hyper-G uses an efficient protocol for delivering such notifications
 on the public Internet. 
@@ -107,14 +112,17 @@
 or fork to a different branch.
 
 In this paper, we present Storm (for *storage module*), a design 
-dealing with these issues of link intactness and version management. Storm is 
a library
+dealing with versioning and dangling links. Storm is a library
 for storing and retrieving data as *blocks*, immutable
 byte sequences identified by cryptographic content hashes
 [ref ht'02 paper]. Additionally, Storm provides services
 for versioned data and Xanalogical storage [ref].
-The Storm design, a hypermedia system built to make use
-of the emerging peer-to-peer search technologies,
-is the main contribution of this paper.
+
+The main contribution of this paper is the Storm design, 
+a hypermedia system built to make use of the emerging 
+peer-to-peer search technologies. Additionally, we hope to 
+provide an input to the ongoing discussion about peer-to-peer
+hypermedia systems [ref ht01, ht02].
 
 Currently, Storm is partially implemented as a part of the Gzz 
 project [ref], which uses Storm exclusively for all disk storage.
@@ -159,19 +167,63 @@
 2. Related Work
 ===============
 
-In advanced hypermedia systems such as Microcosm[] and Hyper-G[],
-several approaches has been proposed to deal with the dangling/other link
-management problems. 
+The dangling link problem has received a lot of attention
+in hypermedia research [refs]. As examples, we examine the ways
+in which HTTP, Microcosm [ref], Chimera [ref] and Hyper-G [ref] 
+deal with the problem.
+
+In HTTP, servers are able to notify a client that a document
+has been moved, and redirect it accordingly [ref spec?]. However,
+this is not required, and there are no facilities for
+updating a link automatically when its target is moved.
+Consequently, broken links are a common experience for Web users.
+
+In Microcosm, hypermedia functionality is implemented
+through *filters*, which react to arbitrary messages
+(such as 'find links to this anchor') generated by
+a client application. Filters are processes on the local system
+or on a remote host [ref distributed microcosm]. When
+a document is moved or deleted, a message is sent
+to the filters. Linkbases implemented as filters can
+update their links accordingly. A client selects a set
+of remote filters to use. Only links stored by one
+of these filters can be found.
+[HymEbook?]
+
+.. Microcosm systems can independently choose 
+   whether to import filters from other systems, and whether
+   to host and export own filters; thus, a system can act
+   as both a client and server at the same time,
+   for example in a workgroup.
+
+In Hyper-G, documents are bound to servers, and a link
+is stored on the servers of the two documents it connects
+[kappe95scalable]. This ensures that all links to a document
+can always be found, but requires the cooperation 
+of both servers. Hyper-G employs a scalable protocol
+for notifying servers when a document has been moved or removed.
+A server hosting links to this document can then ask
+the link's author to change the link, or at least the link
+can be removed automatically. The *p-flood* algorithm
+employed by Hyper-G for this purpose guarantees that a message
+is delivered to all interested servers, but requires that each
+interested server keeps a list of all the others.
+
+[XXX Chimera -- or any other distributed hypermedia system?]
 
+All of these systems are built around the fundamental assumption
+that it is impossible to resolve a random [XXX] identifier.
 The use of location-independent identifiers
 for documents, resolved through a peer-to-peer lookup system, 
-makes such a notification unnecessary; when a document is moved, 
+makes notification of the servers storing links unnecessary; 
+when a document is moved, 
 but retains its identifier, it can be found by the same mechanism as
 before the move. It is possible to retrieve the document
 from any system storing a copy; this means that documents may be
 accessible even after the original publisher has taken them off-line [#]_.
 
-[Relocate this, e.g. at the end of this section ? -Hermanni]
+Conversely, an external link published by any host can be found
+when the endpoint of the link is known... XXX
 
 .. [#] Intentionally or unintentionally. We believe that it is 
    a good thing if published documents remain available even when
@@ -181,21 +233,11 @@
    [Possible refs: http://www.openp2p.com/topics/p2p/p2p_law/.
    However, they are necessarily directly related to this :( -Hermanni]
 
-Microcosm addressed the linking problems of large archives 
-by separating the links from the documents and storing them on dedicated 
-linkbases, with the the following requirement:  when a document (where a 
-position or document dependant link anchor occurs) is moved or deleted, 
-the hypermedia document management system (or hyperbase), ought to be informed 
-[HymEbook?]. 
-
-In Hyper-G, when there are similar changes, all other Hyper-G servers that 
-reference to specific document can be informed, and an efficient protocol has 
been 
-proposed for that purpose [kappe95scalable].
-(All that does not change the basic assumption, may even be seen as
-workarounds from the p2p solution's point of view?)
-[yet Chimera and other OHS?]
--- agreed: we're talking server-centric vs documents-not-bound-to-server,
-here; I believe most distributed hymedia sys are server-centric in this sense.]
+Even Xanadu [ref], which went a long way to ensure that links do not break
+when their targets are copied from one document to another,
+required permanent connection to a network of servers to function. 
+Moreover, Xananu's 1988 incarnation [ref Green] addressed data 
+based on the address of a server holding a 'master copy.'
 
 Likewise, version control systems like CVS or RCS [ref] usually assume
 a central server hosting a repository. The WebDAV/DeltaV protocols,
@@ -205,12 +247,6 @@
 to branch and merge overlapping repositories without any central control
 [is there a specific ref for this?].
 
-Even Xanadu [ref], which went a long way to ensure that links do not break
-when their targets are copied from one document to another,
-required permanent connection to a network of servers to function. 
-Moreover, Xananu's 1988 incarnation [ref Green] addressed data 
-based on the address of a server holding a 'master copy.'
-
 Lotus Notes [ref], popular database sharing and colloboration tool, has some 
 similarities to Storm. In both systems, for instance, data is identified by 
 using GUIDs. However, partly because of the long age of the system, Lotus 
Notes 
@@ -247,31 +283,51 @@
 3. Block storage
 ================
 
-[Do we need a figure, which shows the overall structure of block storage
-with pointers, diffs etc ? -Hermanni]
+.. [Do we need a figure, which shows the overall structure of block storage
+   with pointers, diffs etc ? -Hermanni]
 
-In our system, Storm (for *storage module*), all data is stored
-as *blocks*, byte sequences identified by a SHA-1 cryptographic content-hash 
+In Storm, all data is stored
+as *blocks*, byte sequences identified by a SHA-1 cryptographic content hash 
 [ref SHA-1 and our ht'02 paper]. Blocks have a similar granularity
 as regular files, but they are immutable, since any change to the
 byte sequence would change the hash (and thus create a different block).
 Mutable data structures are built on top of the immutable blocks
 (see Section 6). 
 
-Immutable blocks has several benefits over existing data storing 
-techiques:
+Storing data in immutable blocks may seem strange at first, but
+has a number of advantages. Firstly, it makes it easy
+to replicate data between systems: A replica of a block never
+needs to be updated; cached copies can be kept as long as desired.
+When a document is replicated, different versions of it can
+coexist on the same system without naming conflicts, since
+each version will be stored in its own block with its own id.
 
-Storm's block storage makes it easy to replicate data between systems.
-Different versions of the same document can easily coexist at this level,
-stored in different blocks. 
-[Previous sentence doesn't parse to me (what level ?) :( -Hermanni]
 To replicate all data from computer A
 on computer B, it suffices to copy all blocks from A to B that B
-does not already store. On the other hand for instance, several popular 
-database management systems (e.g. Lotus Notes [ref]) have complex 
-replication schemes, which may led awkward replication conflicts, 
-because of they lack the immutable properties of data. 
-[Or does this belong to diff section ? -Hermanni]
+does not already store. This can be done through a simple 'copy'
+command. In contrast, a system based on mutable resources
+has to use more advanced schemes, for example merging the changes
+done to a document at A or B. (Merging is still be necessary
+when a user wants to incorporate a set of changes, but not
+required at replication time.)
+
+Secondly, immutable blocks increase *reliability*. 
+When saving a document, an application will only *add* blocks,
+never overwrite existing data. When a bug causes an application
+to write malformed data, only the changes from one session
+will be lost; the previous version of the data will still
+be accessible. This makes Storm well suited as a basis
+for implementing experimental projects (such as ours).
+Even production systems occasionally corrupt existing data
+when an overwriting save operation goes awry; for example,
+one of the authors has had this problem with
+Microsoft Word many times.
+
+.. On the other hand for instance, several popular 
+   database management systems (e.g. Lotus Notes [ref]) have complex 
+   replication schemes, which may led awkward replication conflicts, 
+   because of they lack the immutable properties of data. 
+   [Or does this belong to diff section ? -Hermanni]
 
 Storm blocks are MIME messages [ref MIME], i.e., objects with
 a header and body as used in Internet mail or HTTP.
@@ -291,16 +347,6 @@
 We have implemented the first three (using hexadecimal
 representations of the block ids for file names).
 
-Storing all data in Storm blocks provides *reliability*:
-When saving a document, an application will only *add* blocks,
-never overwrite existing data. When a bug causes an application
-to write malformed data, only the changes from one session
-will be lost; the previous version of the data will still
-be accessible. This makes Storm well suited as a basis
-for implementing experimental projects (such as ours).
-
-[Example of common Microsoft Word saving issue ? -Hermanni]
-
 When used in a network environment, Storm IDs do not provide
 a hint as to where a specific block is stored in the network.
 However, many existing peer-to-peer systems could be used to
@@ -465,14 +511,13 @@
 to index blocks provide the following callback
 to a Storm pool::
 
-    getMappings(block) -> set of (key, value) pairs
-
-.. [What is mapping ? We should explain this :) -Hermanni
-       isn't it explained above? it's a set of key-value pairs]
+    getMappings(block) -> 
+        set of (key, value) pairs
 
 This callback processes a block and returns a set of mappings
-to be placed into the index. The Storm pool, in turn, provides 
-the following interface to the application:
+(key/value pairs) to be placed into the index. 
+The Storm pool, in turn, provides 
+the following interface to the application::
 
     get(key) -> set of (block, value) pairs
 
@@ -791,6 +836,27 @@
 In broadcasting approach, implementations' differences mostly lie in the 
 *structural level* of overlay network, i.e. super peers and peer clusters.
 
+.. (Probabilistic access to documents may be ok in e.g. workgroups,
+   but does not really seem desirable. (At the ht'02 panel, Bouvin
+   said they might be ok, which others found very... bold.) 
+   One example may be a user's public comments on documents; 
+   these might be only available when that user is online.
+
+.. cf half-life of peers (Mojo Nation): Is it desirable that 'weak' peers
+   participate in a DHT? -- In Circle, peers must have been online
+   for at least an hour... In which ways, then, can 'weak' peers contribute
+   to the network in a p2p fashion? Caching is certainly one central
+   way, esp. when combined with multisource downloading (this can
+   potentially boost download speeds to the full available bandwidth).
+   This is a performance/reliability issue rather than something
+   changing the fundamental qualities of the network, but still important.
+
+   The important point about p2p publishing is that no account and setup
+   is necessary to start publishing.
+
+   One possibility: Use IBP for limited-time publishing, referring to
+   the location through the DHT? This might be related to p2p publishing.
+
 
 Review of the use cases: what does storm in each?
 -------------------------------------------------
@@ -833,6 +899,7 @@
 7. When B reconnects, can check comments to the Document etc? How does that
 happen? Index?
  
+
 8. Experience and future directions
 ===================================
 
@@ -860,6 +927,7 @@
 interestingly zope by default stores diffs in the zodb (similarly:cvs&storm?)
 Benja also noted about pointing to external data? (couldn't find the post
 from the archives)
+
 
 9. Conclusions
 ==============




reply via email to

[Prev in Thread] Current Thread [Next in Thread]