gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert...


From: Hermanni Hyytiälä
Subject: [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert...
Date: Fri, 28 Feb 2003 04:09:15 -0500

CVSROOT:        /cvsroot/gzz
Module name:    gzz
Changes by:     Hermanni Hyytiälä <address@hidden>      03/02/28 04:09:15

Modified files:
        Documentation/misc/hemppah-progradu: masterthesis.tex 
                                             progradu.bib 

Log message:
        DHT, DOLR and CAST

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/gzz/Documentation/misc/hemppah-progradu/masterthesis.tex.diff?tr1=1.94&tr2=1.95&r1=text&r2=text
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/gzz/Documentation/misc/hemppah-progradu/progradu.bib.diff?tr1=1.86&tr2=1.87&r1=text&r2=text

Patches:
Index: gzz/Documentation/misc/hemppah-progradu/masterthesis.tex
diff -u gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.94 
gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.95
--- gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.94       Thu Feb 
27 09:09:19 2003
+++ gzz/Documentation/misc/hemppah-progradu/masterthesis.tex    Fri Feb 28 
04:09:15 2003
@@ -362,8 +362,6 @@
 where $ps$ = $summaryindex(provider(s)) \bigwedge (p = provider(s))$\}
 
 
-
-
 \section{Tightly structured}
 
 In recents months, several tightly structured overlays has been proposed.
@@ -386,12 +384,12 @@
 to implement identifier space.
 
 To store data into tightly structured overlay, each application-specific
-key is \emph{mapped} by the overlay to a existing peer in the overlay. Thus, 
tightly
-structured overlay assigns a subset of all possible keys to every 
participating peer.
-Furtermore, each peer in the structured overlay maintains a \emph{routing 
table}, 
-which consists of identifiers and IP addresses of other peers in the overlay. 
-These are peer's neighbors in the overlay network. Figure 
\ref{fig:structured_hashing} 
-illustrates the process of data to key mapping in tightly strucuted overlays.
+unique key (e.g., SHA-1 \cite{fips-sha-1}) is \emph{mapped} by the overlay to 
a 
+existing peer in the overlay. Thus, tightly structured overlay assigns a 
subset of all 
+possible keys to every participating peer. Furtermore, each peer in the 
structured 
+overlay maintains a \emph{routing table}, which consists of identifiers and IP 
addresses 
+of other peers in the overlay. These are peer's neighbors in the overlay 
network. 
+Figure \ref{fig:structured_hashing} illustrates the process of data to key 
mapping in tightly strucuted overlays.
 
 \begin{figure}
 \centering
@@ -410,6 +408,14 @@
 between current peer working with query and the key which was
 looked up.
 
+Skip Graphs and Swan employ a key space very similar to a tightly structured
+overlay, but in which queries are routed  to \emph{identifiers}. In these 
systems
+peer occupies several positions in the identifier space, one for each 
+application-specific key. The indirection of placing close keys in the 
+custody of a storing peer\footnote{Storing peer is the peer in the overlay 
which stores the
+assigned keys.} keys is removed at the cost of each peer maintaining one 
+''resource node'' in the overlay network for each resource item pair it 
publishes.
+
 Stoica et al. \cite{balakrishanarticle03lookupp2p} have listed 
 four requirements for tightly structured overlays, which have to be 
 addressed in order to perform data lookups in tightly structured overlays. 
@@ -435,7 +441,7 @@
 only constant number of neighbor peers while providing $O(\log{n})$ data lookup
 efficiency. Koorde, recent modification of Chord, uses de Bruijn graphs to 
maintain
 local routing tables. Koorde requires each peer to have only about two links 
to other
-peers to to provide $O(\log{n})$ performance. Peernet 
+peers to to provide $O(\log{n})$ performance.
 
 \begin{figure}
 \centering
@@ -453,72 +459,45 @@
 \end{figure}
 
 
-Chord example ?
-
-abstraction: DHT, DOLR, multicast/anycast
-
-
--service is data block, node/peer is a physical computer
--*servers* self-organize towards a lookup network
--DHTs can be thought as a 'structured overlay random graphs'
--There is a explicit metric space in every DHT. Term 'closeness' differs 
between existing DHTs: it can be XOR/numerical/eucklidean difference between 
identifiers
--Resource can *be* a resource, or a *pointer* to a resource
--a service request is routed towards service based on node's local knowledge
--identifiers are expected to be distributed uniformly
--DHTs require a knowledge of identifier space's size initially
--resembles a balanced tree structure
-SWAN and Skip Graphs
--in this scheme, node = service
--*key-value pairs* self-organise towards a lookup network
--There is a explicit metric space. Term 'closeness' differs between existing 
DHTs: it can be XOR/numerical/eucklidean difference between identifiers
--Resource can *be* a resource, or a *pointer* to a resource
--a service request is routed towards service based on node's local knowledge
--services can be hosted locally (opposite to DHTs)
--identifiers are expected to be distributed uniformly
--system does find the service, if it exists
-
+There are three higher level abstractions which tightly structured overlays 
provide
+\cite{zhao03api}. Each of these abstractions fulfil a storage layer in an 
overlay, but
+they have semantical differences in the \emph{usage} of overlay. First, 
Distributed Hash 
+Table (DHT) (see e.g., \cite{dabek01widearea}, \cite{rowstron01storage}),  
+implements three operations: \texttt{lookup(key)}, \texttt{remove(key)} and 
+\texttt{insert(key)}. As the name suggests, DHT implements the same 
functionality 
+as a regular hashtable, by storing the mapping between a key and a value. 
DHT's 
+\emph{interface} is generic; values can be any size and type. Figure 
\ref{fig:Structured_lookup_using_DHT_model}
+shows the DHT abstraction of tightly structured overlay. Second, Decentralized 
+Object Location (DOLR) (see e.g., \cite{kubiatowicz00oceanstore}, 
\cite{iyer02squirrel}) is distributed 
+directory service. DOLR stores \emph{pointers} to where data items are stored 
+throughout the overlay. DOLR's main operations are \texttt{publish(key)}, 
+\texttt{removePublished(key)} and \texttt{sendToObject(key)}. The key 
+difference between DHT and DOLR abstraction is that DOLR routes overlay's 
messages
+to nearest available peer, hosting a specific data item. This form of locality
+is not supported by DHT. Finally, tightly structured overlay can be used for
+scalable group multicast/anycast operations (CAST) (see e.g., 
\cite{zhuang01bayeux}).
+The basic operation are \texttt{join(groupIdentifier)}, 
\texttt{leave(groupIdentifier)},
+\texttt{multicast(message, groupIdentifier)},  \texttt{anycast(message, 
groupIdentifier)}.
+Participating peers may join and leave a group and send multicast messages to
+the group, or anycast message to a specific member of the group. DOLR's and 
CAST's
+have much in common.For instance, they both use network proximity techniques
+to optimize their operation in the overlay. Figure 
\ref{fig:Strucutred_lookup_using_DOLR_model}
+ presents basic operation of DOLR abstraction. 
 
-+fast routing (aka searching)
-%+scalable (10^9 users, 10^14 data items)
-+robust
-+little network traffic
--own resources are mapped into the network (not necessary!!)
--keyword/fuzzy search not possible yet
--routing/query hotspots
--ASSUME THAT ALL NODES HAVE IDENTICAL CABABILITIES! However, in real life, p2p 
enviroment is extremely heterogeneous!
-
--the basic idea behind many DHTs is the fact that they perform operations in a 
binary-like tree
--more space/node --> the search arity is higher --> k is higher in k-ary trees
--DHTs' performance efficiency is derived from these tree based operations 
(e.g. split to half the previous scope)
-
-Skip graphs
-+fast routing (aka searching)
-+scalable
-+little network traffic (however, more than DHTs)
-+support *data* locality, opposite to DHTs where data is spread out evenly, 
destroying locality. In skip graphs, hashing is not requires, only *keys* 
matters (which we already have)
-+support for partial repair mechanism/self-stabilization (DHTs lack of repair 
mechanism/self-stabilization)
-+the most robust algorithm up to date, tolerance of adversial faults (DHTs 
support only of random faults)
-+adaptive: doesn't need to know keyspace size before hand, instead of many 
DHTs, which require a priori knowledge about the size of the system or its 
keyspace
-+support for range queries, e.g. natural for version control: 'find latest 
news from yesterday', 'find largest key < news:12/12/2002', 'find all related 
objects nearby'
-(in DHTs this is a open problem)
-+There are not hotspots in skip graphs (query/routing hotsports)
--not network/geographical locality
--for our purposes, Skip Graps could be used (sensible) only for block IDs, not 
for URNs: in Skip Graps, there have to be total order for data elements --> for 
URNs, there cannot be a total order
--for range queries, we would have to use local sensitive hashing, *if* we want 
fetch blocks like 'next block = i + 1'. SHA-1 is a (secure) random hash of 
content (it won't help for us in this case)
+\begin{figure}
+\centering
+\includegraphics[width=10cm, height=8cm]{DHT_lookup.eps}
+\caption{DHT abstraction of tightly structured overley}
+\label{fig:Structured_lookup_using_DHT_model}
+\end{figure}
 
-Peernet
--Peernet is a p2p based *network layer* for large networks
--Peernet makes an explicit distinction between node identity and address
--requires no manual configuration (plug-and-play)
--can support wireless or wireline infrastructure
--log(n) routing table, log(n) search  
 
-\cite{aspnes02faultrouting}
-\cite{ratnasamy02ght}
-\cite{236713}
-\cite{258660}
-
-\cite{fips-sha-1}
+\begin{figure}
+\centering
+\includegraphics[width=10cm, height=8cm]{DOLR_lookup.eps}
+\caption{DOLR abstraction of tightly structured overley}
+\label{fig:Strucutred_lookup_using_DOLR_model}
+\end{figure}
 
 
 \subsection{Formal definition}
@@ -537,106 +516,123 @@
 Every $p$ has neighbor(s), named as $neighbor$, which are $P$ = \{$p \in P: 
\exists neighbor$, 
 where $difference(p,p_neighbor)= close$, and  $close$ is minimal difference 
$d$ in $(IS,d)$\}.
 
-\subsection{Protocols}
-
-
-
-
-
-
-
-
-\cite{freedman02trie}
-
-\cite{plaxton97accessingnearby}
-
-
-
-
-\cite{78977}
-
-
-
-
-
-
-\cite{garciamolina03sil}
-
-\cite{rowston03controlloingreliability}
-
-
-
-\cite{byers03dhtbalancing}
-
-\cite{pias03lighthouse}
-
-
-\cite{debruijn46graph}
-
-
-\subsection{Distributed Hash Tables}
-
-\subsection{Decentralized Object Location and Routing Networks}
 
-\subsection{Group multicast any anycast}
 
+\section{Summary}
 
-\begin{figure}
-\centering
-\includegraphics[width=10cm, height=8cm]{DHT_lookup.eps}
-\caption{Kademlia's lookup process}
-\label{fig:Structured lookup using DHT model}
-\end{figure}
+Measures:
 
+degree:
 
-\begin{figure}
-\centering
-\includegraphics[width=10cm, height=8cm]{DOLR_lookup.eps}
-\caption{Kademlia's lookup process}
-\label{fig:Strucutred lookup using DOLR model}
-\end{figure}
+hop count: 
 
-\cite{Gribble:2000:SDD}
+fault-tolerance
 
-\cite{dabek01widearea}
+maintenance overhead
 
-\cite{iyer02squirrel}
+load balance
 
-\cite{harrisoncircle}
 
-\cite{rowstron01storage}
+\subsection{Differences}
 
--CFS splits files into blocks (<50Kb), PAST distributed whole files
 
-\cite{kubiatowicz00oceanstore}
 
+\scriptsize
+\begin{longtable}{|l|l|l|}
 
 
+\hline 
+\multicolumn{1}{|c|}{\textbf{Property}} &
+\multicolumn{1}{c|}{\textbf{Unstructured}} & 
+\multicolumn{1}{c|}{\textbf{Structured}}  
+ 
+\\ \hline 
+\endfirsthead
 
+\multicolumn{3}{c}%
+{{\tablename\ \thetable{} -- continued from previous page}} \\
+\hline 
+\multicolumn{1}{|c|}{\textbf{Property}} &
+\multicolumn{1}{c|}{\textbf{Loosely structured}} &
+\multicolumn{1}{c|}{\textbf{Tightly structured}}
+\\ \hline 
+\endhead
 
+\endfoot
 
-\section{Summary}
 
-Measures:
 
-degree:
+\parbox{90pt}{Queries} &
+\parbox{100pt}{Uncontrolled} &
+\parbox{100pt}{Controlled}  
+\\ \hline
 
-hop count: 
+\parbox{90pt}{A way for performing queries} &
+\parbox{100pt}{Keywords} &
+\parbox{100pt}{Exact keys} 
+\\ \hline
+         
+\parbox{90pt}{Query traffic} &
+\parbox{100pt}{$O(n)/O(n^{2})$}  &
+\parbox{100pt}{$O(1)/O(\log{n})$} 
+\\ \hline
 
-fault-tolerance
+\parbox{90pt}{Guaranteed data lookup} &
+\parbox{100pt}{Not necessarily} &
+\parbox{100pt}{Yes} 
+\\ \hline
 
-maintenance overhead
+\parbox{90pt}{Overlay's structure} &
+\parbox{100pt}{Uncontrolled and ad hoc}  &
+\parbox{100pt}{Controlled and structured}
+\\ \hline
+                 
+\parbox{90pt}{Max. number of nodes} &
+\parbox{100pt}{Millions} &
+\parbox{100pt}{Billions} 
+\\ \hline
+                       
+\parbox{90pt}{Data placement} &
+\parbox{100pt}{Local} &
+\parbox{100pt}{Not local} 
+\\ \hline
+                 
+\parbox{90pt}{Support for heterogeneity} &
+\parbox{100pt}{Yes} &
+\parbox{100pt}{No} 
+\\ \hline
+       
+\parbox{90pt}{Support for locality} &
+\parbox{100pt}{Yes} &
+\parbox{100pt}{Partial}        
+\\ \hline
+         
+\parbox{90pt}{Possibility for routing hotspots} &
+\parbox{100pt}{No} &
+\parbox{100pt}{Yes} 
+\\ \hline
+       
+\parbox{90pt}{Design/Implementation complexity} &
+\parbox{100pt}{Low} &
+\parbox{100pt}{High} 
+\\ \hline
 
-load balance
+\parbox{90pt}{Fault-tolerant} &
+\parbox{100pt}{High} &
+\parbox{100pt}{High}
+\\ \hline
 
 
-\subsection{Differences}
+\caption{Comparison of loosely structured and tighly structured approaches} 
+\label{table_comparison_approach} 
 
 
+\end{longtable}
+\normalsize
 
 
 
-\subsection{Protocols}
+\subsection{Algorithms}
 
 \scriptsize
 \begin{longtable}{|l|c|c|c|c|l|}
@@ -828,6 +824,105 @@
 \normalsize
 
 
+-service is data block, node/peer is a physical computer
+-*servers* self-organize towards a lookup network
+-DHTs can be thought as a 'structured overlay random graphs'
+-There is a explicit metric space in every DHT. Term 'closeness' differs 
between existing DHTs: it can be XOR/numerical/eucklidean difference between 
identifiers
+-Resource can *be* a resource, or a *pointer* to a resource
+-a service request is routed towards service based on node's local knowledge
+-identifiers are expected to be distributed uniformly
+-DHTs require a knowledge of identifier space's size initially
+-resembles a balanced tree structure
+SWAN and Skip Graphs
+-in this scheme, node = service
+-*key-value pairs* self-organise towards a lookup network
+-There is a explicit metric space. Term 'closeness' differs between existing 
DHTs: it can be XOR/numerical/eucklidean difference between identifiers
+-Resource can *be* a resource, or a *pointer* to a resource
+-a service request is routed towards service based on node's local knowledge
+-services can be hosted locally (opposite to DHTs)
+-identifiers are expected to be distributed uniformly
+-system does find the service, if it exists
+
+
++fast routing (aka searching)
+%+scalable (10^9 users, 10^14 data items)
++robust
++little network traffic
+-own resources are mapped into the network (not necessary!!)
+-keyword/fuzzy search not possible yet
+-routing/query hotspots
+-ASSUME THAT ALL NODES HAVE IDENTICAL CABABILITIES! However, in real life, p2p 
enviroment is extremely heterogeneous!
+
+-the basic idea behind many DHTs is the fact that they perform operations in a 
binary-like tree
+-more space/node --> the search arity is higher --> k is higher in k-ary trees
+-DHTs' performance efficiency is derived from these tree based operations 
(e.g. split to half the previous scope)
+
+Skip graphs
++fast routing (aka searching)
++scalable
++little network traffic (however, more than DHTs)
++support *data* locality, opposite to DHTs where data is spread out evenly, 
destroying locality. In skip graphs, hashing is not requires, only *keys* 
matters (which we already have)
++support for partial repair mechanism/self-stabilization (DHTs lack of repair 
mechanism/self-stabilization)
++the most robust algorithm up to date, tolerance of adversial faults (DHTs 
support only of random faults)
++adaptive: doesn't need to know keyspace size before hand, instead of many 
DHTs, which require a priori knowledge about the size of the system or its 
keyspace
++support for range queries, e.g. natural for version control: 'find latest 
news from yesterday', 'find largest key < news:12/12/2002', 'find all related 
objects nearby'
+(in DHTs this is a open problem)
++There are not hotspots in skip graphs (query/routing hotsports)
+-not network/geographical locality
+-for our purposes, Skip Graps could be used (sensible) only for block IDs, not 
for URNs: in Skip Graps, there have to be total order for data elements --> for 
URNs, there cannot be a total order
+-for range queries, we would have to use local sensitive hashing, *if* we want 
fetch blocks like 'next block = i + 1'. SHA-1 is a (secure) random hash of 
content (it won't help for us in this case)
+
+Peernet
+-Peernet is a p2p based *network layer* for large networks
+-Peernet makes an explicit distinction between node identity and address
+-requires no manual configuration (plug-and-play)
+-can support wireless or wireline infrastructure
+-log(n) routing table, log(n) search  
+
+\cite{aspnes02faultrouting}
+\cite{ratnasamy02ght}
+\cite{236713}
+\cite{258660}
+
+\cite{freedman02trie}
+
+\cite{plaxton97accessingnearby}
+
+
+
+
+\cite{78977}
+
+
+
+
+
+
+\cite{garciamolina03sil}
+
+\cite{rowston03controlloingreliability}
+
+
+
+\cite{byers03dhtbalancing}
+
+\cite{pias03lighthouse}
+
+
+\cite{debruijn46graph}
+
+
+
+\cite{Gribble:2000:SDD}
+
+
+
+\cite{harrisoncircle}
+
+
+
+-CFS splits files into blocks (<50Kb), PAST distributed whole files
+
 
 
 
@@ -1753,98 +1848,7 @@
 Table \ref{table_comparison_approach} lists the key feature of both approaches.
 
 
-\scriptsize
-\begin{longtable}{|l|l|l|}
-
-
-\hline 
-\multicolumn{1}{|c|}{\textbf{Property}} &
-\multicolumn{1}{c|}{\textbf{Unstructured}} & 
-\multicolumn{1}{c|}{\textbf{Structured}}  
- 
-\\ \hline 
-\endfirsthead
 
-\multicolumn{3}{c}%
-{{\tablename\ \thetable{} -- continued from previous page}} \\
-\hline 
-\multicolumn{1}{|c|}{\textbf{Property}} &
-\multicolumn{1}{c|}{\textbf{Loosely structured}} &
-\multicolumn{1}{c|}{\textbf{Tightly structured}}
-\\ \hline 
-\endhead
-
-\endfoot
-
-
-
-\parbox{90pt}{Queries} &
-\parbox{100pt}{Uncontrolled} &
-\parbox{100pt}{Controlled}  
-\\ \hline
-
-\parbox{90pt}{A way for performing queries} &
-\parbox{100pt}{Keywords} &
-\parbox{100pt}{Exact keys} 
-\\ \hline
-         
-\parbox{90pt}{Query traffic} &
-\parbox{100pt}{$O(n)/O(n^{2})$}  &
-\parbox{100pt}{$O(1)/O(\log{n})$} 
-\\ \hline
-
-\parbox{90pt}{Guaranteed data lookup} &
-\parbox{100pt}{Not necessarily} &
-\parbox{100pt}{Yes} 
-\\ \hline
-
-\parbox{90pt}{Overlay's structure} &
-\parbox{100pt}{Uncontrolled and ad hoc}  &
-\parbox{100pt}{Controlled and structured}
-\\ \hline
-                 
-\parbox{90pt}{Max. number of nodes} &
-\parbox{100pt}{Millions} &
-\parbox{100pt}{Billions} 
-\\ \hline
-                       
-\parbox{90pt}{Data placement} &
-\parbox{100pt}{Local} &
-\parbox{100pt}{Not local} 
-\\ \hline
-                 
-\parbox{90pt}{Support for heterogeneity} &
-\parbox{100pt}{Yes} &
-\parbox{100pt}{No} 
-\\ \hline
-       
-\parbox{90pt}{Support for locality} &
-\parbox{100pt}{Yes} &
-\parbox{100pt}{Partial}        
-\\ \hline
-         
-\parbox{90pt}{Possibility for routing hotspots} &
-\parbox{100pt}{No} &
-\parbox{100pt}{Yes} 
-\\ \hline
-       
-\parbox{90pt}{Design/Implementation complexity} &
-\parbox{100pt}{Low} &
-\parbox{100pt}{High} 
-\\ \hline
-
-\parbox{90pt}{Fault-tolerant} &
-\parbox{100pt}{High} &
-\parbox{100pt}{High}
-\\ \hline
-
-
-\caption{Comparison of loosely structured and tighly structured approaches} 
-\label{table_comparison_approach} 
-
-
-\end{longtable}
-\normalsize
 
 
 
Index: gzz/Documentation/misc/hemppah-progradu/progradu.bib
diff -u gzz/Documentation/misc/hemppah-progradu/progradu.bib:1.86 
gzz/Documentation/misc/hemppah-progradu/progradu.bib:1.87
--- gzz/Documentation/misc/hemppah-progradu/progradu.bib:1.86   Wed Feb 26 
07:51:01 2003
+++ gzz/Documentation/misc/hemppah-progradu/progradu.bib        Fri Feb 28 
04:09:15 2003
@@ -1968,3 +1968,13 @@
        url = 
{http://dbpubs.stanford.edu/pub/showDoc.Fulltext?lang=en&doc=2002-26&format=pdf&compression=}
 
 }
+
address@hidden,
+        author = "S. Zhuang and B. Zhao and A. Joseph and R. Katz and J. 
Kubiatowicz",
+        title = "Bayeux: An Architecture for Scalable and Fault-tolerant 
WideArea Data Dissemination",
+        booktitle ="In Proc. of the Eleventh International Workshop on Network 
and Operating System Support 
+        for Digital Audio and Video (NOSSDAV 2001)",
+        month = "June",
+        year = "2001",  
+        url = 
"http://www.cs.berkeley.edu/~ravenben/publications/pdf/bayeux.pdf";
+}




reply via email to

[Prev in Thread] Current Thread [Next in Thread]