gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] manuscripts/storm article.rst


From: Benja Fallenstein
Subject: [Gzz-commits] manuscripts/storm article.rst
Date: Sat, 08 Feb 2003 22:53:10 -0500

CVSROOT:        /cvsroot/gzz
Module name:    manuscripts
Changes by:     Benja Fallenstein <address@hidden>      03/02/08 22:53:10

Modified files:
        storm          : article.rst 

Log message:
        more

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/manuscripts/storm/article.rst.diff?tr1=1.117&tr2=1.118&r1=text&r2=text

Patches:
Index: manuscripts/storm/article.rst
diff -u manuscripts/storm/article.rst:1.117 manuscripts/storm/article.rst:1.118
--- manuscripts/storm/article.rst:1.117 Sat Feb  8 17:08:01 2003
+++ manuscripts/storm/article.rst       Sat Feb  8 22:53:09 2003
@@ -258,49 +258,82 @@
 2.3. Peer-to-peer systems
 -------------------------
 
-During the last few years, there have been a lot of research efforts related 
-to Peer-to-Peer (p2p) resource discovery, both in industry and academic world.
-Intensive work in p2p field has yielded two main approaches: broadcasting 
-[ref: gnutella1, kazaa, limewire, shareaza] and Distributed Hash Tables (DHT) 
-[refs: chord, can, tapestry, pastry, kademlia, symphony, viceroy, skip graphs, 
-swan]. Both of these approaches use an application level overlay network.
-However, there are significant differences between broadcasting 
-and DHT approach in scalability and efficiency properties. A DHT
-usually provides log-like bounds to *all* internal 
-operations [#]_ (footnote about 'stable state' ?), while broadcasting can't 
achieve 
-either of these.
+During the last few years, there has been a lot of research
+related to peer-to-peer resource discovery, both academical and in the 
industry.
+There are two main approaches: broadcasting [gnutella1, kazaa, limewire,
+shareaza], and distributed hashtables (DHTs) [chord, can, tapestry, pastry,
+kademlia, symphony, viceroy]. Broadcasting systems
+forward queries to all systems reachable in a given number of hops
+(time-to-live). DHTs store (key,value) pairs which can be found given
+the key; a DHT assigns each peer a subset of all possible keys, and
+routes queries for a given key to the peer responsible for it.
+Before a pair can be found, it must be *inserted* in the DHT
+by sending it to the peer responsible for the key. Both approaches
+use an application-level overlay network for routing.
+
+While broadcasting systems' performance is linear, DHTs' performance
+usually has log-like bounds in the number of peers 
+for *all* internal operations [#]_. This scalability is
+what makes global searches feasible in DHTs. In broadcasting approaches,
+on the other hand, scalability is archieved by forwarding queries 
+only to a limited subset of the peers (bounded by the time-to-live),
+which means that searches in these systems are not truly global.
 
 .. [#] It's not clear whether *all* proposed DHT designs can preserve
    log-like properties when participants are heterogeneous and they 
    join and leave the system in a dynamic manner. 
 
-A distributed hashtable stores key/value pairs.
-In a DHT, both hashtable items and the addresses of peers 
-are mapped into a single virtual key space. The form of the key space 
-depends on implementation (for example, Chord uses a circle).
-A distance metric (e.g. numerical, XOR) is used to find the peer
-whose position in the key space is 'closest' to the key of a given item.
-This peer is responsible to store the item (so both queries and insertions
-relating to the key are routed to it.) Thus, 
-DHT's overlay connectivity graph is structured. On the other hand, the overlay 
-connectivity graph of broadcasting approach is formed more or less (depends on 
-implementation) in a random manner. 
-
-When performing queries, in broadcasting approach, peer sends a query request 
to a 
-subset of its neighbors and these peers to their subsequent neighbors. The 
-process will continue as long as query's time-to-live (TTL) value hasn't been 
reached. 
-In DHT approach, query request is deterministically routed towards the peer 
-which hosts a specific data item. Routing is based on 'hints' (based on 
-differences between data item's key and peer's key), which each peer provides 
-along the routing path.
-
-Obviously, there are major differences within approaches. For the DHT 
approach, 
-perhaps the main difference is *what* is self-organized into a
-virtual key space. For instance, in SWAN [ref] and Skip Graph [ref], *data 
-items* self-organise into a virtual address space, while in other DHT 
-implementations *peers* self-organise in structured form in a virtual space. 
-In the broadcasting approach, implementations' differences mostly lie in the 
-*structural level* of the overlay network, i.e. super peers and peer clusters.
+A DHT has a *key space*, for example the points on a circle.
+The keys in (key,value) pairs are mapped to points in the key space
+through a hash function. Independently, each peer is assigned
+a point in the space. The DHT defines a distance metric
+between points in the key space (e.g. numeric, XOR); the peer
+responsible for a hashtable key, then, is the one that is *closest*
+to it in the key space, according to the distance metric.
+A peer, then, is analogous to a hashtable bucket.
+Queries are routed to the overlay network, each hop bringing
+them closer to its destination in key space, until they reach
+the peer responsible for them.
+
+.. http://sahara.cs.berkeley.edu/jan2003-retreat/ravenben_api_talk.pdf
+   Full paper will appear in IPTPS 2003 -Hermanni
+
+Recently, a few DHT-like systems have been developed which employ
+a key space similarly to a DHT, but in which queries are routed
+to (key,value) pairs [SWAN, skip graph]: A peer 
+occupies several positions in the key space, one for each 
+(key,value) pair. In such a system, the indirection of placing
+close keys in the custody of a 'hashtable bucket' peer is removed
+at the cost of each peer maintining one node in the overlay network
+for each (key,value) pair it publishes.
+
+.. hemppah's original text before benja's changes:
+   In a DHT, both hashtable items and the addresses of peers 
+   are mapped into a single virtual key space. The form of the key space 
+   depends on implementation (for example, Chord uses a circle).
+   A distance metric (e.g. numerical, XOR) is used to find the peer
+   whose position in the key space is 'closest' to the key of a given item.
+   This peer is responsible to store the item (so both queries and insertions
+   relating to the key are routed to it.) Thus, 
+   DHT's overlay connectivity graph is structured. On the other hand, the 
overlay 
+   connectivity graph of broadcasting approach is formed more or less (depends 
on 
+   implementation) in a random manner. 
+
+   When performing queries, in broadcasting approach, peer sends a query 
request to a 
+   subset of its neighbors and these peers to their subsequent neighbors. The 
+   process will continue as long as query's time-to-live (TTL) value hasn't 
been reached. 
+   In DHT approach, query request is deterministically routed towards the peer 
+   which hosts a specific data item. Routing is based on 'hints' (based on 
+   differences between data item's key and peer's key), which each peer 
provides 
+   along the routing path.
+
+   Obviously, there are major differences within approaches. For the DHT 
approach, 
+   perhaps the main difference is *what* is self-organized into a
+   virtual key space. For instance, in SWAN [ref] and Skip Graph [ref], *data 
+   items* self-organise into a virtual address space, while in other DHT 
+   implementations *peers* self-organise in structured form in a virtual 
space. 
+   In the broadcasting approach, implementations' differences mostly lie in 
the 
+   *structural level* of the overlay network, i.e. super peers and peer 
clusters.
 
 CFS [ref], which is built upon Chord DHT peer-to-peer routing layer[ref], 
stores 
 data as blocks. However, CFS *splits* data (files) into several miniblocks and 
@@ -337,18 +370,6 @@
 Mutable data structures are built on top of the immutable blocks
 (see Section 6).
 
-When used in a network environment, such ids do not provide
-a hint as to where a specific block is stored.
-However, many existing peer-to-peer systems could be used to
-find arbitrary blocks in a location-independent fashion; 
-for example, Freenet [ref], recent Gnutella-based clients 
-(e.g. Shareaza [ref]), and Overnet/eDonkey2000 [ref] 
-also use SHA-1-based identifiers [e.g. ref: magnet uri]. 
-(However, we have not put a network 
-implementation into regular use yet and thus can only describe our 
-design, not report on implementation experience.
-We discuss peer-to-peer implementations in Section 7, below.)
-
 Storing data in immutable blocks may seem strange at first, but
 has a number of advantages. First of all, it makes identifiers
 self-certifying: no matter where we have downloaded a block from,
@@ -481,6 +502,25 @@
 We have implemented the first three (using hexadecimal
 representations of the block ids for file names).
 
+Many existing peer-to-peer systems could be used to
+find blocks on the network.
+For example, Freenet [ref], recent Gnutella-based clients 
+(e.g. Shareaza [ref]), and Overnet/eDonkey2000 [ref] 
+also use SHA-1-based identifiers [e.g. ref: magnet uri]. 
+Implementations on top of a DHT could use both the
+directory and the home store approach as defined by [ref Squirrel].
+
+Unfortunately, we have not put a p2p-based implementation
+into use yet and can therefore only report on our design.
+Currently, we are working on a prototype implementation
+based on the GISP distributed hashtable [ref]
+and the directory approach (using the DHT to find a peer
+with a copy of the block, then using HTTP to download the block).
+Many practical problems have to be overcome before this
+implementation will be usable (for example seeding the
+table of known peers, and issues with UDP and network
+address translation [ref]).
+
 Sometimes it is useful to think about *zones* blocks are in,
 related to distribution policy: for example, a *public*
 zone for blocks served to others in the network, a *private*
@@ -628,47 +668,42 @@
 Storm provides a general API for indexing blocks in
 application-specific ways. We have implemented indexing
 on a local machine, but the interface is designed so that
-implementation on top of networking overlay (e.g. distributed hashtable)
-will be trivial.
-
-.. [Benja, this might be useful for defining Storm APIs for DHTs etc: 
-   http://sahara.cs.berkeley.edu/jan2003-retreat/ravenben_api_talk.pdf
-   Full paper will appear in IPTPS 2003 -Hermanni]
-
-   Benja says: Hm, does that belong in the p2p section?
+implementation on top of a distributed hashtable
+will be straight-forward. (Again, our GISP-based implementation
+is in a very early stage.)
 
 In Storm, applications are not allowed to put arbitrary
-mappings into the index. Instead, applications that want 
+items into the index. Instead, applications that want 
 to index blocks provide the following callback
 to a Storm pool::
 
-    getMappings(block) -> 
+    getItems(block) -> 
         set of (key, value) pairs
 
-This callback processes a block and returns a set of mappings
-(key/value pairs) to be placed into the index. 
+This callback processes a block and returns a set of 
+hashtable items (key/value pairs) to be placed into the index. 
 The Storm pool, in turn, provides 
 the following interface to the application::
 
     get(key) -> set of (block, value) pairs
 
-This function finds all mappings created by this application
+This function finds all items created by this application
 with a given key, indicating both the application-provided
-value and the block for which the mapping was created.
+value and the block for which the item was created.
 
-We use the ``getMappings()`` approach instead of
-allowing applications to put arbitrary mappings into the database
-because binding mappings to blocks makes it easy for pools
-to e.g. remove associated mappings when deleting a block.
+We use the ``getItems()`` approach instead of
+allowing applications to put arbitrary items into the database
+because binding items to blocks makes it easy for pools
+to e.g. remove associated items when deleting a block.
 
-As an example, the ``getMappings()`` method of our Xanalogical 
+As an example, the ``getItems()`` method of our Xanalogical 
 storage implementation will, for a block containing a document,
-collect all the spans in a document, and return mappings
+collect all the spans in a document, and return items
 from their scroll blocks' IDs to the spans and their positions
 in the document. When we want to find the transclusions of
-a span, we use ``get()`` to get the mappings for the ID of 
+a span, we use ``get()`` to get the items for the ID of 
 that span's scroll block, and load the document blocks referenced
-by the mappings.
+by the items.
 
 In a networked implementation, each peer is responsible
 for indexing the blocks it stores. Since no peer can
@@ -688,13 +723,13 @@
 that have already been indexed. When the Storm pool
 implementation is initialized, it compares the list
 of indexed blocks with the list of all available blocks,
-and asks the application for unindexed blocks' mappings.
+and asks the application for unindexed blocks' items.
 
 One indexing application that may seem obvious is keyword-based
 full-text search. However, no research has been done
 in this direction; it is not clear whether the current
 interface is well suited to this, or whether current implementations
-are able to scale to the load to store a mapping for each word
+are able to scale to the load to store an item for each word
 occuring in a document.
 
 .. [There are two refs about keywords in DHTs-- should we ref these ? 
-Hermanni]




reply via email to

[Prev in Thread] Current Thread [Next in Thread]