[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gzz-commits] manuscripts/storm article.rst
From: |
Benja Fallenstein |
Subject: |
[Gzz-commits] manuscripts/storm article.rst |
Date: |
Sat, 08 Feb 2003 22:53:10 -0500 |
CVSROOT: /cvsroot/gzz
Module name: manuscripts
Changes by: Benja Fallenstein <address@hidden> 03/02/08 22:53:10
Modified files:
storm : article.rst
Log message:
more
CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/manuscripts/storm/article.rst.diff?tr1=1.117&tr2=1.118&r1=text&r2=text
Patches:
Index: manuscripts/storm/article.rst
diff -u manuscripts/storm/article.rst:1.117 manuscripts/storm/article.rst:1.118
--- manuscripts/storm/article.rst:1.117 Sat Feb 8 17:08:01 2003
+++ manuscripts/storm/article.rst Sat Feb 8 22:53:09 2003
@@ -258,49 +258,82 @@
2.3. Peer-to-peer systems
-------------------------
-During the last few years, there have been a lot of research efforts related
-to Peer-to-Peer (p2p) resource discovery, both in industry and academic world.
-Intensive work in p2p field has yielded two main approaches: broadcasting
-[ref: gnutella1, kazaa, limewire, shareaza] and Distributed Hash Tables (DHT)
-[refs: chord, can, tapestry, pastry, kademlia, symphony, viceroy, skip graphs,
-swan]. Both of these approaches use an application level overlay network.
-However, there are significant differences between broadcasting
-and DHT approach in scalability and efficiency properties. A DHT
-usually provides log-like bounds to *all* internal
-operations [#]_ (footnote about 'stable state' ?), while broadcasting can't
achieve
-either of these.
+During the last few years, there has been a lot of research
+related to peer-to-peer resource discovery, both academical and in the
industry.
+There are two main approaches: broadcasting [gnutella1, kazaa, limewire,
+shareaza], and distributed hashtables (DHTs) [chord, can, tapestry, pastry,
+kademlia, symphony, viceroy]. Broadcasting systems
+forward queries to all systems reachable in a given number of hops
+(time-to-live). DHTs store (key,value) pairs which can be found given
+the key; a DHT assigns each peer a subset of all possible keys, and
+routes queries for a given key to the peer responsible for it.
+Before a pair can be found, it must be *inserted* in the DHT
+by sending it to the peer responsible for the key. Both approaches
+use an application-level overlay network for routing.
+
+While broadcasting systems' performance is linear, DHTs' performance
+usually has log-like bounds in the number of peers
+for *all* internal operations [#]_. This scalability is
+what makes global searches feasible in DHTs. In broadcasting approaches,
+on the other hand, scalability is archieved by forwarding queries
+only to a limited subset of the peers (bounded by the time-to-live),
+which means that searches in these systems are not truly global.
.. [#] It's not clear whether *all* proposed DHT designs can preserve
log-like properties when participants are heterogeneous and they
join and leave the system in a dynamic manner.
-A distributed hashtable stores key/value pairs.
-In a DHT, both hashtable items and the addresses of peers
-are mapped into a single virtual key space. The form of the key space
-depends on implementation (for example, Chord uses a circle).
-A distance metric (e.g. numerical, XOR) is used to find the peer
-whose position in the key space is 'closest' to the key of a given item.
-This peer is responsible to store the item (so both queries and insertions
-relating to the key are routed to it.) Thus,
-DHT's overlay connectivity graph is structured. On the other hand, the overlay
-connectivity graph of broadcasting approach is formed more or less (depends on
-implementation) in a random manner.
-
-When performing queries, in broadcasting approach, peer sends a query request
to a
-subset of its neighbors and these peers to their subsequent neighbors. The
-process will continue as long as query's time-to-live (TTL) value hasn't been
reached.
-In DHT approach, query request is deterministically routed towards the peer
-which hosts a specific data item. Routing is based on 'hints' (based on
-differences between data item's key and peer's key), which each peer provides
-along the routing path.
-
-Obviously, there are major differences within approaches. For the DHT
approach,
-perhaps the main difference is *what* is self-organized into a
-virtual key space. For instance, in SWAN [ref] and Skip Graph [ref], *data
-items* self-organise into a virtual address space, while in other DHT
-implementations *peers* self-organise in structured form in a virtual space.
-In the broadcasting approach, implementations' differences mostly lie in the
-*structural level* of the overlay network, i.e. super peers and peer clusters.
+A DHT has a *key space*, for example the points on a circle.
+The keys in (key,value) pairs are mapped to points in the key space
+through a hash function. Independently, each peer is assigned
+a point in the space. The DHT defines a distance metric
+between points in the key space (e.g. numeric, XOR); the peer
+responsible for a hashtable key, then, is the one that is *closest*
+to it in the key space, according to the distance metric.
+A peer, then, is analogous to a hashtable bucket.
+Queries are routed to the overlay network, each hop bringing
+them closer to its destination in key space, until they reach
+the peer responsible for them.
+
+.. http://sahara.cs.berkeley.edu/jan2003-retreat/ravenben_api_talk.pdf
+ Full paper will appear in IPTPS 2003 -Hermanni
+
+Recently, a few DHT-like systems have been developed which employ
+a key space similarly to a DHT, but in which queries are routed
+to (key,value) pairs [SWAN, skip graph]: A peer
+occupies several positions in the key space, one for each
+(key,value) pair. In such a system, the indirection of placing
+close keys in the custody of a 'hashtable bucket' peer is removed
+at the cost of each peer maintining one node in the overlay network
+for each (key,value) pair it publishes.
+
+.. hemppah's original text before benja's changes:
+ In a DHT, both hashtable items and the addresses of peers
+ are mapped into a single virtual key space. The form of the key space
+ depends on implementation (for example, Chord uses a circle).
+ A distance metric (e.g. numerical, XOR) is used to find the peer
+ whose position in the key space is 'closest' to the key of a given item.
+ This peer is responsible to store the item (so both queries and insertions
+ relating to the key are routed to it.) Thus,
+ DHT's overlay connectivity graph is structured. On the other hand, the
overlay
+ connectivity graph of broadcasting approach is formed more or less (depends
on
+ implementation) in a random manner.
+
+ When performing queries, in broadcasting approach, peer sends a query
request to a
+ subset of its neighbors and these peers to their subsequent neighbors. The
+ process will continue as long as query's time-to-live (TTL) value hasn't
been reached.
+ In DHT approach, query request is deterministically routed towards the peer
+ which hosts a specific data item. Routing is based on 'hints' (based on
+ differences between data item's key and peer's key), which each peer
provides
+ along the routing path.
+
+ Obviously, there are major differences within approaches. For the DHT
approach,
+ perhaps the main difference is *what* is self-organized into a
+ virtual key space. For instance, in SWAN [ref] and Skip Graph [ref], *data
+ items* self-organise into a virtual address space, while in other DHT
+ implementations *peers* self-organise in structured form in a virtual
space.
+ In the broadcasting approach, implementations' differences mostly lie in
the
+ *structural level* of the overlay network, i.e. super peers and peer
clusters.
CFS [ref], which is built upon Chord DHT peer-to-peer routing layer[ref],
stores
data as blocks. However, CFS *splits* data (files) into several miniblocks and
@@ -337,18 +370,6 @@
Mutable data structures are built on top of the immutable blocks
(see Section 6).
-When used in a network environment, such ids do not provide
-a hint as to where a specific block is stored.
-However, many existing peer-to-peer systems could be used to
-find arbitrary blocks in a location-independent fashion;
-for example, Freenet [ref], recent Gnutella-based clients
-(e.g. Shareaza [ref]), and Overnet/eDonkey2000 [ref]
-also use SHA-1-based identifiers [e.g. ref: magnet uri].
-(However, we have not put a network
-implementation into regular use yet and thus can only describe our
-design, not report on implementation experience.
-We discuss peer-to-peer implementations in Section 7, below.)
-
Storing data in immutable blocks may seem strange at first, but
has a number of advantages. First of all, it makes identifiers
self-certifying: no matter where we have downloaded a block from,
@@ -481,6 +502,25 @@
We have implemented the first three (using hexadecimal
representations of the block ids for file names).
+Many existing peer-to-peer systems could be used to
+find blocks on the network.
+For example, Freenet [ref], recent Gnutella-based clients
+(e.g. Shareaza [ref]), and Overnet/eDonkey2000 [ref]
+also use SHA-1-based identifiers [e.g. ref: magnet uri].
+Implementations on top of a DHT could use both the
+directory and the home store approach as defined by [ref Squirrel].
+
+Unfortunately, we have not put a p2p-based implementation
+into use yet and can therefore only report on our design.
+Currently, we are working on a prototype implementation
+based on the GISP distributed hashtable [ref]
+and the directory approach (using the DHT to find a peer
+with a copy of the block, then using HTTP to download the block).
+Many practical problems have to be overcome before this
+implementation will be usable (for example seeding the
+table of known peers, and issues with UDP and network
+address translation [ref]).
+
Sometimes it is useful to think about *zones* blocks are in,
related to distribution policy: for example, a *public*
zone for blocks served to others in the network, a *private*
@@ -628,47 +668,42 @@
Storm provides a general API for indexing blocks in
application-specific ways. We have implemented indexing
on a local machine, but the interface is designed so that
-implementation on top of networking overlay (e.g. distributed hashtable)
-will be trivial.
-
-.. [Benja, this might be useful for defining Storm APIs for DHTs etc:
- http://sahara.cs.berkeley.edu/jan2003-retreat/ravenben_api_talk.pdf
- Full paper will appear in IPTPS 2003 -Hermanni]
-
- Benja says: Hm, does that belong in the p2p section?
+implementation on top of a distributed hashtable
+will be straight-forward. (Again, our GISP-based implementation
+is in a very early stage.)
In Storm, applications are not allowed to put arbitrary
-mappings into the index. Instead, applications that want
+items into the index. Instead, applications that want
to index blocks provide the following callback
to a Storm pool::
- getMappings(block) ->
+ getItems(block) ->
set of (key, value) pairs
-This callback processes a block and returns a set of mappings
-(key/value pairs) to be placed into the index.
+This callback processes a block and returns a set of
+hashtable items (key/value pairs) to be placed into the index.
The Storm pool, in turn, provides
the following interface to the application::
get(key) -> set of (block, value) pairs
-This function finds all mappings created by this application
+This function finds all items created by this application
with a given key, indicating both the application-provided
-value and the block for which the mapping was created.
+value and the block for which the item was created.
-We use the ``getMappings()`` approach instead of
-allowing applications to put arbitrary mappings into the database
-because binding mappings to blocks makes it easy for pools
-to e.g. remove associated mappings when deleting a block.
+We use the ``getItems()`` approach instead of
+allowing applications to put arbitrary items into the database
+because binding items to blocks makes it easy for pools
+to e.g. remove associated items when deleting a block.
-As an example, the ``getMappings()`` method of our Xanalogical
+As an example, the ``getItems()`` method of our Xanalogical
storage implementation will, for a block containing a document,
-collect all the spans in a document, and return mappings
+collect all the spans in a document, and return items
from their scroll blocks' IDs to the spans and their positions
in the document. When we want to find the transclusions of
-a span, we use ``get()`` to get the mappings for the ID of
+a span, we use ``get()`` to get the items for the ID of
that span's scroll block, and load the document blocks referenced
-by the mappings.
+by the items.
In a networked implementation, each peer is responsible
for indexing the blocks it stores. Since no peer can
@@ -688,13 +723,13 @@
that have already been indexed. When the Storm pool
implementation is initialized, it compares the list
of indexed blocks with the list of all available blocks,
-and asks the application for unindexed blocks' mappings.
+and asks the application for unindexed blocks' items.
One indexing application that may seem obvious is keyword-based
full-text search. However, no research has been done
in this direction; it is not clear whether the current
interface is well suited to this, or whether current implementations
-are able to scale to the load to store a mapping for each word
+are able to scale to the load to store an item for each word
occuring in a document.
.. [There are two refs about keywords in DHTs-- should we ref these ?
-Hermanni]
- [Gzz-commits] manuscripts/storm article.rst, (continued)
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/07
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/02/08
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/02/08
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/02/08
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/08
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/08
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/08
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/08
- [Gzz-commits] manuscripts/storm article.rst,
Benja Fallenstein <=
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/09
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/09
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/09
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/09
- [Gzz-commits] manuscripts/storm article.rst, Benja Fallenstein, 2003/02/09
- [Gzz-commits] manuscripts/storm article.rst, Tuomas J. Lukka, 2003/02/10
- [Gzz-commits] manuscripts/storm article.rst, Tuomas J. Lukka, 2003/02/10
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/02/10
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/02/10
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/02/10