gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] manuscripts/storm article.rst


From: Hermanni Hyytiälä
Subject: [Gzz-commits] manuscripts/storm article.rst
Date: Fri, 31 Jan 2003 05:58:56 -0500

CVSROOT:        /cvsroot/gzz
Module name:    manuscripts
Changes by:     Hermanni Hyytiälä <address@hidden>      03/01/31 05:58:56

Modified files:
        storm          : article.rst 

Log message:
        Comments, suggestions etc.

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/manuscripts/storm/article.rst.diff?tr1=1.60&tr2=1.61&r1=text&r2=text

Patches:
Index: manuscripts/storm/article.rst
diff -u manuscripts/storm/article.rst:1.60 manuscripts/storm/article.rst:1.61
--- manuscripts/storm/article.rst:1.60  Fri Jan 31 01:57:16 2003
+++ manuscripts/storm/article.rst       Fri Jan 31 05:58:56 2003
@@ -28,8 +28,8 @@
 computers, being sent as e-mail attachments, carried around on disks,
 published on the web, moved between desktop and laptop systems,
 downloaded for off-line reading or copied between computers in a LAN. 
-Often, the same document will be independently modified 
-on two unconnected systems. In this paper, we address two issues
+Often, the same document is independently modified 
+on two unconnected, separete systems. We address two issues
 raised by this *data mobility*: Dangling links, and keeping track
 of alternative versions. Resolvable location-independent identifiers
 make these issues much easier to deal with, since data
@@ -43,7 +43,7 @@
 private data and documents published on the Internet by
 using the same identifiers for both.
 Storm has been partially implemented as a part of the Gzz project [ref], 
-which uses it exclusively for all disk storage. On top of Storm,
+which uses Storm exclusively for all disk storage. On top of Storm,
 we have built a system for storing mutable, versioned data
 and an implementation of Xanalogical storage [ref].
 
@@ -213,6 +213,9 @@
 3. Block storage
 ================
 
+[Do we need a figure, which shows the overall structure of block storage
+with pointers and diffs ? -Hermanni]
+
 In our system, Storm (for *storage module*), all data is stored
 as *blocks*, byte sequences identified by a SHA-1 cryptographic content-hash 
 [ref SHA-1 and our ht'02 paper]. Blocks often have a similar granularity
@@ -221,8 +224,8 @@
 Mutable data structures are built on top of the immutable blocks
 (see Section 6).
 
-hemppah: Or should these lines be inserted to some other section and tell more 
about these
-systems, e.g. 5.2 ?
+[Or should these lines be inserted to some other section and tell more about 
these
+systems, e.g. 5.2 ? -Hermanni]
 
 CFS [ref], which is built upon Chord routing layer[ref], store data as blocks. 
 However, CFS *splits* files into several miniblocks and spreads blocks over 
the 
@@ -230,15 +233,18 @@
 files into blocks, since they store data as whole files. All previously 
mentioned 
 systems lack of the immutable property which is used in Storm blocks.
 
-Immutable blocks has several benefits...
+Immutable blocks has several benefits over existing systems...
 
-Block storage makes it easy to replicate data between systems.
+1) Storm's block storage makes it easy to replicate data between systems.
 Different versions of the same document can easily coexist at this level,
-stored in different blocks. To replicate all data from computer A
+stored in different blocks. 
+[Previous sentence doesn't parse to me (what level ?) :( -Hermanni]
+To replicate all data from computer A
 on computer B, it suffices to copy all blocks from A to B that B
 does not already store.
+[Example of Lotus Notes' replication conficts ? -Hermanni]
 
-Storm blocks are MIME messages [ref MIME], i.e., objects with
+2) Storm blocks are MIME messages [ref MIME], i.e., objects with
 a header and body as used in Internet mail or HTTP.
 This allows them to carry any metadata that can be carried
 in a MIME header, most importantly a content type.
@@ -250,34 +256,37 @@
     get(id) -> block
     add(block)
     delete(block)
+    
+[analogy to regular Hash Table/DHT ? -Hermanni]
 
-Implementations may store blocks in RAM, in individual files,
+3) Implementations may store blocks in RAM, in individual files,
 in a Zip archive, in a database or through other means.
 We have implemented the first three (using hexadecimal
 representations of the block ids for file names).
 
-Storing all data in Storm blocks provides *reliability*:
+4) Storing all data in Storm blocks provides *reliability*:
 When saving a document, an application will only *add* blocks,
 never overwrite existing data. When a bug causes an application
 to write malformed data, only the changes from one session
 will be lost; the previous version of the data will still
-be accessible. This makes Storm well suited as a basis
-for implementing experimental projects (such as ours).
+be accessible. (Footnote: This makes Storm well suited as a basis
+for implementing experimental projects (such as ours).)
 
-When used in a network environment, Storm ids do not provide
+5) When used in a network environment, Storm ids do not provide
 a hint as to where in the network the matching block can be found.
 However, current peer-to-peer systems could be used to
-find blocks in a distributed fashion; for example, Freenet [ref], 
-a few recent Gnutella clients [e.g. ref: shareaza] , Overnet/eDonkey2000 [ref] 
-also use SHA-1-based identifiers [e.g. ref: magnet uri].
-However, we have not put a network implementation into regular use
+find blocks efficiently in a distributed fashion; for example, 
+Freenet [ref], a few recent Gnutella clients [e.g. ref: shareaza], 
+Overnet/eDonkey2000 [ref] also use SHA-1-based identifiers 
+[e.g. ref: magnet uri].
+(Footnote:However, we have not put a network implementation into regular use
 yet and thus can only describe our design, not report on
-implementation experience.
+implementation experience.)
 We discuss peer-to-peer implementations in Section 7, below.
 
-The immutability of blocks should make caching trivial, since it is
+6) The immutability of blocks should make caching trivial, since it is
 never necessary to check for new versions of blocks.
-Since the same namespace is used for local data and data
+Since the same namespace [mention urn-5 ? -Hermanni] is used for local data 
and data
 retrieved from the network, online documents that have been
 permanently downloaded to the local harddisk can also be found
 by the caching mechanism. This is convenient for offline browsing,
@@ -285,25 +294,28 @@
 while they are online, store them locally, and be sure that
 their software will be able to access them as if downloaded
 from the net, without broken links.
+[Previous sentence doesn't parse to me: more simple :( -Hermanni]
 
 Given a peer-to-peer distribution mechanism, it would be possible
 to retrieve blocks from any peer online that has a copy
 in its cache or permanent storage. This is similar to the Squirrel
-web cache [ref], but does not require trust between the peers,
-since it is possible to check the blocks' cryptographic hashes.
-Since much-requested blocks would be cached on many systems,
-such a network could deal with XXX much more easily.
-On the other hand, there are privacy concerns with exposing
-one's browser cache to the outside world.
-
+web cache [ref] [more refs? -Hermanni], but does not require trust 
+between the peers, since it is possible to check the blocks' integrity by 
using 
+cryptographic hashes. Since much-requested blocks would be 
+cached on many systems, such a network could deal with XXX 
+much more easily. On the other hand, there are privacy 
+concerns with exposing one's browser cache to the outside world.
 
 
+[Merge this paragraph with 5) ? -Hermanni]
 That all data is stored in blocks means that links to it
 are completely independent of location; when data is moved
-between servers, references to it do not break. (Of course, this
-requires that the blocks can be found no matter what server
+between servers, references to it do not break. (Footnote: Of course, 
+this requires that the blocks can be found no matter what server
 they are on. Again, see Section 7.)
 
+[Is there disadvantages/issus which we are aware of ? -Hermanni]
+
 
 4. Xanalogical storage
 ======================
@@ -321,23 +333,25 @@
 =============
 
 Clearly, for block storage to be useful, there has to be a way to
-efficiently update documents. We archieve this by a combination of 
-two mechanisms. Firstly, a *pointer* is an updatable reference to a block;
+efficiently update documents/maintain different versions of documents. 
+We achieve this by a combination of two mechanisms. Firstly, a 
+*pointer* is an updatable reference to a block;
 pointers can be updated by creating a specific kind of Storm block
 representing an assertion of the form, "pointer ``P`` now points
 to block ``B``." Pointers are resolved with the help of a Storm index 
 mapping pointer identifiers to blocks providing targets for that pointer.
 Through this mechanism, we can keep old versions of documents
 along with the current versions.
+[Figure ? -Hermanni]
 
 Secondly, in the spirit of version control systems like CVS,
-we do not store each version, but only the differences between versions.
+we do not store *each version*, but only the differences between versions.
 However, we still refer to each full version by the id of a block
 containing that version, even though we do not store this block.
 When we want to access a particular version, we reconstruct it
 using the differences, and then check the result using
 the cryptographic hash in the full version's block id.
-
+[Figure ? -Hermanni]
 
 6.1. Pointers
 -------------
@@ -388,9 +402,9 @@
 is structured. On the other hand, the overlay connectivity graph of 
broadcasting 
 approach is formed more or less (depends on implementation) in a random 
manner. 
 
-When performing queries, in broadcasting approach peer sends a query request 
to a 
+When performing queries, in broadcasting approach, peer sends a query request 
to a 
 subset of its neighbors and these peers to their subsequent neighbors. The 
-process will continue as long as query's time-to-live (TTL) hasn't been 
reached. 
+process will continue as long as query's time-to-live (TTL) value hasn't been 
reached. 
 In DHT approach, query request is deterministically routed towards the peer 
 which hosts a specific data item. Routing is based on 'hints' (based on 
 differences between data item's key and peer's key), which each peer provides 
@@ -451,7 +465,9 @@
 
 Future directions: of course, we shoul implement a prototype
 
-Open issue/Future directions: implement multisource downloading 
+Open issue/Future directions: implement multisource downloading
+
+Future directions: Implement home node model or directory model ? 
 
 9. Conclusions
 ==============




reply via email to

[Prev in Thread] Current Thread [Next in Thread]