[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert...
From: |
Hermanni Hyytiälä |
Subject: |
[Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert... |
Date: |
Tue, 04 Mar 2003 10:02:10 -0500 |
CVSROOT: /cvsroot/gzz
Module name: gzz
Changes by: Hermanni Hyytiälä <address@hidden> 03/03/04 10:02:10
Modified files:
Documentation/misc/hemppah-progradu: masterthesis.tex
Log message:
Typos and updates
CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/gzz/Documentation/misc/hemppah-progradu/masterthesis.tex.diff?tr1=1.110&tr2=1.111&r1=text&r2=text
Patches:
Index: gzz/Documentation/misc/hemppah-progradu/masterthesis.tex
diff -u gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.110
gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.111
--- gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.110 Tue Mar
4 07:25:17 2003
+++ gzz/Documentation/misc/hemppah-progradu/masterthesis.tex Tue Mar 4
10:02:09 2003
@@ -34,7 +34,7 @@
\contactinformation{\\
Hermanni Hyytiälä\\
Huhtalammentie 5 as. 17\\
-37637 JYVÄSKYLÄ\\
+40640 JYVÄSKYLÄ\\
sähköposti: address@hidden
@@ -43,7 +43,7 @@
key properties. We summarize open problems in Peer-to-Peer networks and divide
problems into three sub-categories. We observe that there are many
problems, which have not solutions at all, or problems have proposed
-solutions, but they are practically unrealizable.
+solutions but they are practically unrealizable.
Then, we give an overview of our Fenfire system. We evaluate existing
Peer-to-Peer approaches-- loosely and tightly structured overlays-- with regard
@@ -52,16 +52,16 @@
}
\tiivistelma{
Tässä opinnäytetyössä arvioimme olemassaolevia vertaisverkkoja, protokollia ja
-niiden erikoisominaisuuksia. Teemme yhteenvedon olemassa olevista ongelmista
-vertaisverkoista ja jaamme ongelmat kolmeen alakategoriaan. Havaitsemme, että
+niiden erityisominaisuuksia. Teemme yhteenvedon olemassa olevista ongelmista
+vertaisverkoissa ja jaamme ongelmat kolmeen alakategoriaan. Havaitsemme, että
on olemassa useita ongelmia, joihin ei ole ratkaisua lainkaan, tai on ehdotelma
-ongelman ratkaisemiseki, mikä kuitenkin käytännössä on mahdotonta toteuttaa.
+ongelman ratkaisemiseksi, mikä käytännössä on kuitenkin mahdotonta toteuttaa.
-Tämän jälkeen annamme yleiskuvan Fenfire-järjestelmästämme.
+Tämän jälkeen annamme yleiskuvan Fenfire-järjestelmästä.
Arvioimme olemassaolevia vertaisverkkoarkkitehtuureja-- löyhästi ja tiukasti
rakennettuja päällysverkkoja-- Fenfiren vaatimusten valossa. Lopuksi ehdotamme
yksin-kertaisia algoritmeja, joiden avulla voidaan tehokkaasti löytää
-vertaisverkosta tietoa Fenfire:n liittyen.
+vertaisverkosta Fenfire:n kannalta olennaista tietoa
}
\begin{document}
@@ -81,8 +81,8 @@
Peer-to-Peer systems have recently received noteworthy attention in both
academia and industry for a number of reasons. First, the lack of
centralization
means that participants can form a distributed system without any investment
to
-centralized, high-priced hardware which would to coordinate it. Second,
Peer-to-Peer
-provides new, direct way to achieve interoperability between network
participants.
+centralized, high-priced hardware which would coordinate it. Second,
Peer-to-Peer
+provides new direct way to achieve interoperability between network
participants.
Finally, the distributed and ad-hoc nature of Peer-to-Peer improves scalability
and reliability againts certain kinds of faults (e.g., single point of
failure).
@@ -111,7 +111,7 @@
problems in easy-to-understand tables.
Next, we give an overview of our Fenfire system, which implements xanalogical
storage model. We
-also describe briefly Storm software module, which is essential part of
Fenfire's
+also describe briefly Storm software module, which is an essential part of
Fenfire's
Peer-to-Peer functionality. We evaluate existing Peer-to-Peer approaches and
choose the best alternative to our needs. We discover that Storm, xanalogical
model and
tightly structured Peer-to-Peer approach all have similar method to deal with
data,
@@ -129,27 +129,27 @@
is to find the most efficient way to locate and fetch Fenfire related data
from the
Peer-to-Peer network, where Scroll block's identifier is given. Second, we want
to find the most efficient way to locate and fetch most recent Fenfire related
data from the
-the Peer-to-Peer network, which is associated with a given urn-5 random
string. Final problem
+the Peer-to-Peer network, which is associated with a given urn-5 random
string. Third problem
is otherwise same as the second problem, except we want to locate and fetch
all Fenfire
related data from the Peer-to-Peer network, where given date and/or time range
is given.
When comparing different Peer-to-Peer approaches and algorithms, we will
examine their
scalability, efficiency, space requirements for neighbor connections and
overhead
-associated with system maintenance. When we have solutions to our research
+associated with system maintenance. Finally, when we have solutions to our
research
problems, we will use best solutions as examples in our algorithm proposals.
\section{Thesis overview}
This thesis is structured as follows. In next chapter, we give an overview of
existing Peer-to-Peer approaches, algorithms and key differences. In chapter
3, we
address open problems in Peer-to-Peer domain and divide problems into three
-sub-categories. Chapter 4 gives an overview of our Fenfire system. In chapter
-5 we evaluate existing Peer-to-Peer approaches with regard to Fenfire system
and
-propose simple algorithms perform data lookups in Fenfire's Peer-to-Peer
enviroment.
-We also discuss open issues and future work. Finally, we present conclusions
in chapter
-6.
+sub-categories. Chapter 4 gives an overview of Fenfire system. In chapter
+5, we evaluate existing Peer-to-Peer approaches with regard to Fenfire system
and
+propose simple algorithms to perform data lookups in Fenfire's Peer-to-Peer
enviroment.
+Then, we discuss open issues and future work. In chapter 6, we present
conclusions.
+
\chapter{Peer-to-Peer architectures}
-In this chapter we give brief history and overview of Peer-to-Peer networks,
+In this chapter we will give brief history and overview of Peer-to-Peer
networks,
review most important Peer-to-Peer algorithms and list key differences between
two main approaches.
@@ -189,17 +189,17 @@
form of modern Peer-to-Peer computing is file-sharing. In this scenario,
participants
of Peer-to-Peer network share their resources to other participants while
obtaining
more resources from others. This can been seen as a variant of distributed
filesystem
-(see, e.g., \cite{levy90distributedfilesystems}).
+(e.g., \cite{levy90distributedfilesystems}).
In a development of modern Peer-to-Peer systems, lot of influences has been
attained from
other research areas than computer science. There has been done research
regarding
to ad-hoc nature of complex networks \cite{albert-02-statistical},
\cite{albert-00-tolerance}, \cite{watts00dynamics}.
It's interesting to realize that chemical properties of cells, the Internet,
ad-hoc
-Peer-to-Peer networks, have in common that they all self-organize based on
same
+Peer-to-Peer networks, have all in common that they self-organize based on
same
principles. Furthermore, the assocation between social connections among
people
and Peer-to-Peer overlay topology has been studied recently
\cite{watts00dynamics},
\cite{kleinberg99small}, \cite{nips02-Kleinberg}. This insight is motivated
-by Milgram, how noticed that people are very effective to locate other people
in a wide scale,
+by Milgram, how noticed that people are very effective to locate other people
in a wide scale
based on local knowledge. This phenomenon is called as ''small-world
phenomenon''
\cite{milgram67smallworld}. As a consequence, many modern Peer-to-Peer systems
have applied techniques outside of computer science when constructing and
maintaining
@@ -207,7 +207,7 @@
In the end, however, there are two main approaches in which all modern
Peer-to-Peer
systems fall: loosely structured approach and tightly structured approach. In
loosely
-structured approach the construction and the maintenance of the overlay is--
as the name
+structured approach the construction and the maintenance of the overlay-- as
the name
suggests- is controlled loosely. This approach gives freedom for participating
peers
to perform certains tasks in Peer-to-Peer network. On the other hand, tightly
structured
approach has some rules, which all participating peers have to obey. In the
following
@@ -230,12 +230,12 @@
\section{Loosely structured}
Gnutella \cite{gnutellaurl} is well-known example of loosely structured
overlay network. As in
-other Peer-to-Peer networks, no Peer is more important than any other Peer in
the network.
-The construction and maintenance of Gnutella network is extremely ad hoc,
since participating
-peers can form the overlay network based on local knowledge. Figure
\ref{fig:gnutella_overlay}
+other Peer-to-Peer networks, no peer is more important than any other peer in
the network.
+The construction and maintenance of Gnutella network is extremely ad-hoc,
since participating
+peers can form the overlay network based on \emph{local} knowledge. Figure
\ref{fig:gnutella_overlay}
illustrates how peers form an overlay network. Initially, peer 1 creates the
overlay, since
it's the first participating peer. Then, repeatly new peers join the network
and connects to
-other nodes in a random manner. Thus, gnutella can be considered as a
\emph{random graph}.
+other nodes in a random manner. Thus, gnutella can be considered as a
variation of \emph{random graph}.
\begin{figure}
\centering
@@ -246,17 +246,17 @@
In Gnutella, each participating peer maintains local index of its own shared
content. Also,
-each peer has a few connections to other peer, i.e., peer's \emph{neighbors}.
Basic gnutella
+each peer has a few connections to other peer, i.e. peer's \emph{neighbors}.
Basic gnutella
data lookup works as follows: peer broadcasts a query request to its neighors,
which in turn
forwards the query to their neighbors. This leads in the situation where
number of messages
in the network can grow with $O(n^{2})$, where $n$ is the number of
participating peers in the
Gnutella network. To limit the amount of network traffic, Gnutella uses
Time-To-Live-limited
-(TTL) flooding to distributed queries. Gnutella uses a breadt-First traversal
with depth limit
-$T$ (e.g., 7), where T is the system-wide maximum TTL of a message in hops.
Therefore, only peers that
+(TTL) flooding to distributed queries. Gnutella uses a Breadt-First-Traversal
(BFS) with depth limit
+$T$ (e.g., 7), where $T$ is the system-wide maximum TTL of a message in hops.
Therefore, only peers that
are TTL hops away from the query originator will forward the query or respond
to the query.
-In Gnutella network, search results are fast, because breadt-First traversal
sends queries to
-every possible neighbor. On the other hand, this method wastes resources and
doesn't scale well.
-Figure \ref{fig:gnutella_query} shows the query lookup process of Gnutella
network.
+In Gnutella network, search results are fast, because BFS sends queries to
+every possible neighbor. Clearly, this method wastes resources and doesn't
scale well.
+Figure \ref{fig:gnutella_query} shows the data lookup process of the Gnutella
network.
\begin{figure}
\centering
@@ -271,16 +271,16 @@
low, the query originator might not find the desired data even it's available
somewhere
in the network. Second, there are many duplicate messages generated by
flooding, especially
in high connectivity graphs. It is obvious that with these limitations,
flooding creates
-significant message processing overhead for each query. Furthermore, flooding
may increase
+significant message processing overhead for each data lookup. Even worse,
flooding may increase
the load on participating to the point, where it has to leave the network.
Lately, there has been done lot of research to improve Gnutella's data lookup
efficiency
and scalability. Adamic et. all \cite{adamic99small},
\cite{adamic02localsearch},
\cite{adamic01powerlawsearch} has been studied different random walk methods
in power-law
networks\footnote{In power-law networks only a few peers have high number of
neighbor
-links and major of peers have low nuber of neighbor links.} and they have
found that by
-instructing peers forwarding queries to select high degree peers the data
lookup's
-performance increases signficantly. As a result, some of the most recent
loosely
+links and major of peers have low number of neighbor links.} and they have
found that by
+instructing peers forwarding data lookups to select high degree peers, the
performance of data lookup
+increases signficantly. As a result, some of the most recent loosely
structured Peer-to-Peer systems have adopted this method with some
modifications
\cite{gnutella2url}, \cite{shareazaurl}, \cite{fasttrackurl},
\cite{morpheusurl},
\cite{kazaaurl}, \cite{jxtaurl}, \cite{jxtaoverview},
\cite{botros01jxtasearch},
@@ -308,7 +308,8 @@
Previously presented improvements are only partial solutions. Obviously, more
research is required to make loosely structured approach's data lookup more
-scalable and effective.
+scalable and effective. More advanced techniques to improve loosely strcutured
+systems' data lookup is presented in chapter 3.
\subsection{Sketch of formal definition}
@@ -328,7 +329,8 @@
\section{Tightly structured}
-In recents months, several tightly structured overlays has been proposed.
+Partly due to loosely structured systems' scalability problems, several
tightly
+structured overlays has been proposed.
This list includes CAN \cite{ratnasamy01can}, Chord \cite{stoica01chord},
Kademlia \cite{maymounkov02kademlia}, Kelips \cite{gupta03kelips},
Koorde \cite{kaashoek03koorde}, ODHDHT \cite{naor03simpledht},
@@ -347,13 +349,22 @@
value of $n$ varies among approaches. Again, CAN uses a $d$-dimensional
cartesian
to implement identifier space.
+Stoica et al. \cite{balakrishanarticle03lookupp2p} have listed
+four requirements for tightly structured overlays, which have to be
+addressed in order to perform data lookups in tightly structured overlays.
+First, mapping of keys to peers must be done in a load-balanced
+way. Second, the overlay must be able to forward a lookup for a
+specific key to an approriate peer. Third, overlay must have a
+support for a distance function. Finally, routing tables for each peer
+must be constructed and maintained adaptively.
+
To store data into tightly structured overlay, each application-specific
unique key (e.g., SHA-1 \cite{fips-sha-1}) is \emph{mapped} uniformly (e.g.,
using consistent
hashing \cite{258660}) by the overlay to a existing peer in the overlay. Thus,
tightly
structured overlay assigns a subset of all possible keys to every
participating peer.
-Furtermore, each peer in the structured overlay maintains a \emph{routing
table}, which
-consists of identifiers and IP addresses of other peers in the overlay. These
are peer's
-neighbors in the overlay network. Figure \ref{fig:structured_hashing}
illustrates the
+Specifically, each peer in tightly structured overlay maintains a
\emph{routing table}, which
+consists of identifiers and IP addresses of other peers in the overlay.
Entries of routing
+table are peer's neighbors in the overlay network. Figure
\ref{fig:structured_hashing} illustrates the
process of data to key mapping in tightly strucuted overlays.
\begin{figure}
@@ -366,55 +377,48 @@
All messages are routed across overlay links towards peers, whose
peer identifier is gradually ''closer'' to the key's identifier
in the identifier space. Distance can be measured by numerical
-difference between identifiers (e.g., Chord), the number of
-same prefix bits between identifiers (e.g., Pastry and Tapestry),
-bit-wise exclusive or (XOR) (e.g., Kademlia). However, in all
-previously schemes, each hop in the overlay shortens the path
+difference between identifiers (e.g., Chord \cite{stoica01chord}), the number
of
+same prefix bits between identifiers (e.g., Pastry \cite{rowston01pastry} and
Tapestry \cite{zhao01tapestry}),
+bit-wise exclusive or (XOR) (e.g., Kademlia \cite{maymounkov02kademlia}).
However, in all
+previously schemes each hop in the overlay shortens the path
between current peer working with query and the key which was
looked up.
Skip Graphs and Swan employ a key space very similar to a tightly structured
-overlay, but in which queries are routed to \emph{identifiers}. In these
systems
+overlay, but in which queries are routed to \emph{keys}. In these systems
peer occupies several positions in the identifier space, one for each
application-specific key. The indirection of placing close keys in the
custody of a storing peer\footnote{Storing peer is the peer in the overlay
which stores the
assigned keys.} keys is removed at the cost of each peer maintaining one
-''resource node'' in the overlay network for each resource item pair it
publishes.
+''resource peer'' in the overlay network for each resource item pair it
publishes.
PeerNet differs from other tightly structured overlays in that it operates
at the \emph{network} level layer. Peernet makes an explicit distinction
-between peer identity and address, which is supported by standard
-TCP/IP-algorithms. Otherwise, PeerNet has same performance properties
+between peer identity and address, which is not supported by standard
+TCP/IP-protocols. Otherwise, PeerNet has the same performance properties
as other tightly structured overlays, i.e. $O(\log{n})$ space required
for maintaining information about other peers in the system and
-$O(\log{n})$ data lookup efficieny.
-
-Stoica et al. \cite{balakrishanarticle03lookupp2p} have listed
-four requirements for tightly structured overlays, which have to be
-addressed in order to perform data lookups in tightly structured overlays.
-First, mapping of keys to peers must be done in a load-balanced
-way. Second, the overlay must be able to forward a lookup for a
-specific key to an approriate peer. Third, overlay must have a
-support for a distance function. Finally, routing tables for each peer
-must be constructed and maintained adaptively.
+$O(\log{n})$ data lookup efficiency.
Currently, all proposed tightly structured overlays provide at least
poly--logaritmical data lookup operations. However, there are some key
-differences in the data structure that they use as a routing table. For
example, Chord, Skip graphs and
-Skipnet maintain a local data structure which resembles skip lists
\cite{78977}.
+differences in the data structure that they use as a routing table. For
example, Chord
+\cite{stoica01chord}, Skip graphs \cite{AspnesS2003} and Skipnet
\cite{harvey03skipnet2} maintain a local
+data structure which resembles Skip lists \cite{78977}.
In figure \ref{fig:structured_query}, we present overview of Chord's lookup
process.
On the left side of Chord's lookup process, we show the same data lookup
process
as binary-tree abstraction. We can notice, that in each step, the distance
between
the query originator and the target in both methods is halved. Thus, the
locarithmic efficiency.
-Kademlia, Pastry and Tapestry uses balanced $k$-trees
-as routing table's data structure. Figure \ref{fig:kademlia_lookup} shows the
process of Kademlia
-data lookup. Viceroy maintains a butterfly data structure (see e.g.,
\cite{226658}),
+Kademlia \cite{maymounkov02kademlia}, Pastry \cite{rowston01pastry} and
Tapestry
+\cite{zhao01tapestry} uses balanced $k$-trees as routing table's data
structure. Figure
+\ref{fig:kademlia_lookup} shows the process of Kademlia
+data lookup. Viceroy \cite{malkhi02viceroy} maintains a butterfly data
structure (e.g., \cite{226658}),
which requires only constant number of neighbor peers while providing
$O(\log{n})$ data lookup
-efficiency. Koorde, recent modification of Chord, uses de Bruijn graphs to
maintain
-local routing tables. Koorde requires each peer to have only about two links
to other
-peers to to provide $O(\log{n})$ performance.
+efficiency. Koorde \cite{kaashoek03koorde}, recent modification of Chord, uses
de Bruijn graphs
+\cite{debruijn46graph} to maintain local routing tables. Koorde requires each
peer to have only
+about two links to other peers to to provide $O(\log{n})$ performance.
\begin{figure}
\centering
@@ -449,9 +453,9 @@
to nearest available peer, hosting a specific data item. This form of locality
is not supported by DHT. Finally, tightly structured overlay can be used for
scalable group multicast/anycast operations (CAST) (see e.g.,
\cite{zhuang01bayeux}).
-The basic operation are \texttt{join(groupIdentifier)},
\texttt{leave(groupIdentifier)},
+The basic operations are \texttt{join(groupIdentifier)},
\texttt{leave(groupIdentifier)},
\texttt{multicast(message, groupIdentifier)}, \texttt{anycast(message,
groupIdentifier)}.
-Participating peers may join and leave a group and send multicast messages to
+Participating peers may join and leave the group and send multicast messages to
the group, or anycast message to a specific member of the group. DOLR's and
CAST's
have much in common.For instance, they both use network proximity techniques
to optimize their operation in the overlay. Figure
\ref{fig:Strucutred_lookup_using_DOLR_model}
@@ -489,9 +493,9 @@
$ip = map(identifier(s))$, which maps data items, expressed by a identifier to
coordinate
point $ip$ in $(IS,d)$. Peer's p resources are mapped onto a set $IS$ = \{$ip
\in IS:
\exists s \in S$, $ip = map(identifier(s)) \wedge (provider(s) = p)$\}.,
-which means that resources that a peer provides into the system are not kept
locally.
+which means that resources which peer provides into the system are not kept
locally.
Every $p$ has neighbor(s), named as $neighbor$, which are $P$ = \{$p \in P:
\exists neighbor$,
-where $difference(p,p_neighbor)= close$, and $close$ is minimal difference
$d$ in $(IS,d)$\}.
+where $difference(p,p_neighbor)= ''close''$, where $''close''$ is small
difference $d$ in $(IS,d)$\}.
\section{Summary}
@@ -512,8 +516,8 @@
Thus, there are significant differences between loosely structured and tightly
structured approaches.
The most important aspect is the performance and scalability. While loosely
structured approach's performance
-is not always even linear, generally tightly structured approach can perform
all operations in
-$\Theta(\log{n})$\footnote{However, it is unknown whether all proposed
algorithms can preserve
+is not always even linear, generally tightly structured approach can perform
all internal operations in
+poly-logarithmic time\footnote{However, it is unknown whether all proposed
algorithms can preserve
logarithmic properties in real-life applications or not.}.
Another key point is the philosophy how overlay network is constructed and
maintained. While loosely
@@ -521,10 +525,10 @@
structured approach has certain features, in which participating peers have no
control at all
(such as mapping of data items).
-To end user, biggest difference between these systems is how data lookups are
performed. Looselely
+To end user, biggest difference between these systems is how data lookups are
performed. Loosely
structured systems provides much more richier and user friendly way of
searching data as they
-have support for keyword search and fuzzy search. On the other, tightly
structured systems support
-only exact key lookups as each data item is identified by unique keys.
+have support for keyword search and fuzzy search. On the other hand, tightly
structured systems support
+only exact key lookups as each data item is identified by globally unique keys.
In the end, both systems have open problems and issues. We will discuss these
aspects more detail in
chapter 3. Table \ref{table_comparison_approach} lists key differences between
loosely structured
@@ -638,15 +642,16 @@
and their key properties with regard to performance and scalability. List
includes algorithms from two main approaches. However, majority of the
algorithms
listed above belongs to tightly structured approach since there has been
active
-research being pursued lately. List doesn't include \emph{all} proposed
Peer-to-Peer
-systems, only the ones which already have been widely deployed in real-life, or
-the ones which may promising in the future's Peer-to-Peer systems.
+research being pursued towards tightly structured approach lately. List
doesn't
+include \emph{all} proposed Peer-to-Peer systems. Only the ones which already
have
+been widely deployed in real-life, or the ones which may promising in the
future's
+Peer-to-Peer systems, are included in this thesis.
We decided to follow the guidelines from \cite{kaashoek03koorde} when
measuring properties of different Peer-to-Peer systems. However, we dropped
out fault tolerance and load balancing properties, since they are hard to
measure
in face of real life requirements. Additionally, we decided to include
-the number of real network connections for each peer in the overlay.
+the number of \emph{real} network connections for each peer in the overlay.
Here, we describe the listed properties of Peer-to-Peer algorihms:
@@ -654,7 +659,7 @@
\item \textbf{Lookup}: Number of messages required when a data lookup is
performed
\item \textbf{Space}: Number of neighbors which peers knows about (neighbors)
\item \textbf{Insert/delete}: Number of messages required when a peer joins or
leaves the network
- \item \textbf{Number of network connections}: Number of concurrent
\emph{network} connections required to maintain correct neighbor information
+ \item \textbf{Number of network connections}: Number of concurrent network
connections required to maintain correct neighbor information
\end{itemize}
\scriptsize
@@ -852,13 +857,13 @@
tables; we list description of the problem, solution and comments on that
specific open problem. Note that open problems list considered here is not
meant
to be an exhaustive survey of \emph{all} open problems in Peer-to-Peer domain;
-we focus our attention to security, scalability and performance related issues
+we focus our attention to security, scalability, usability and performance
related issues
only.
\section{Overview}
Partly due to the non-maturity of modern Peer-to-Peer technology, it has
several
-open problems to be solved. Main open problems are related to performance,
scalability
+open problems to be solved. Main open problems are related to performance,
scalability, usability
and security. More important, many techniques developed for traditional
distributed
systems may no longer apply with Peer-to-Peer systems. Therefore, new
solutions are
needed to make Peer-to-Peer systems more secure and efficient.
@@ -867,8 +872,8 @@
Since Napster \cite{napsterurl} and Gnutella \cite{gnutellaurl} was first time
introduced
to public, researchers' main concern has been scalability problem of loosely
structured
approach. However, people often misunderstand the scalability problem of
loosely structured
-approach; loosely structured approache's \emph{network} is scalable, but the
\emph{query model} is not
-scalable. Tightly structured approach's main concern is to make overlay's data
lookup
+approach; loosely structured systems' \emph{network} is scalable, but the
\emph{query model} is not.
+Tightly structured system's main concern is to make overlay's data lookup
routing more flexible againts hostile attacks. Another key problems in tightly
structured
approach are the lack of keyword searches and support for heterogeneous peers.
@@ -889,8 +894,8 @@
general Distrubuted Denial of Service attack.
In Sybil attack model, hostile entity presents multpile
-entities. Therefore, one hostile entity can control a large fraction of the
Peer-to-Peer system. The best
-possible solution to Sybil attack would be that system could \emph{distinct}
entities reliably. Unfortunately,
+entities. Therefore, one hostile entity can control a large fraction of the
Peer-to-Peer system. Optimal
+possible solution to Sybil attack would be that system could \emph{distinct}
system's entities reliably. Unfortunately,
currently there no realizable techiques for this task. Partial solutions for
Sybil is attack is to replicate
and fragment data randomly among several participating peer. However, both
suggestions assume that two different
remote entities are actually different; Sybil attacks are still possible and
therefore, would need centralized
@@ -903,10 +908,7 @@
troijan. Closey related to fail-stop model is the Byzantine attack model
\cite{357176}. Byzantine model can been seen more seveve than fail-stop model
as there are no restrictions over
the behaviour of faulty peers. Partial, practical solution for byzantine
failures has been proposed by Castro et
-al \cite{296824}. General robustness properties of Peer-to-Peer system is able
to deal with software failures and hostile
-attack, but redundancy againts external threats is unknown. The reason for
this is that there are no experiences
-on these kinds of attacks. Possible solution would be distributed anti-virus
software, but much more intensive
-research is required for solve these problems.
+al \cite{296824}.
Spam generating attack is another known attack model againts Peer-to-Peer
system. In Spam
attack, hostile or faulty peer may produce false information of the data.
Possible solution againts this attack
@@ -926,7 +928,7 @@
\subsection{Trust, data authenticity and integrity}
-Currently, trust in Peer-to-Peer systems is based on \emph{reputation}.
Current repuation methods focus either
+Trust in Peer-to-Peer systems is based on \emph{reputation}. Proposed
repuation methods focus either
on the semantic properties, or data management properties of the trust model.
Some research has been
done on reputation models in Peer-to-Peer systems, such as
\cite{aberer01trust}, \cite{cornelli02reputableservents}.
Implementations include Advogato \cite{advogatourl}. None of the current
proposals or implementations
@@ -943,12 +945,12 @@
ConChord \cite{ajmani02conchord} is the first Peer-to-Peer system which has a
support for PKI based
security infrastructure. Unfortunately, ConChord is in early in development
and lacks of important
-features of PKI to be fully usable yet. Furthermore, the hierarchy of
SDSI/SPKI may a problem for
-Peer-to-Peer systems, in which hierarchy is intentionally missing.
+features of PKI to be fully usable yet. Furthermore, the hierarchy of
SDSI/SPKI \cite{rivest96sdsi},
+\cite{spkiworkinggroup} may a problem for Peer-to-Peer systems, in which
hierarchy is intentionally missing.
-For data integrity, on the other hand, there are few working solutions.
Cryptographic content hashes,
-such as \cite{fips-sha-1}, variations \cite{merkle87hashtree} and their
implementation techniques \cite{mohr02thex},
-are efficient and reliable methods for identifying the integrity of data in
Peer-to-Peer systems. One
+For data integrity, on the other hand, there are few working solutions.
Cryptographic content hashes
+\cite{fips-sha-1}, variations \cite{merkle87hashtree} and their implementation
techniques \cite{mohr02thex},
+are efficient and reliable method for identifying the integrity of data in
Peer-to-Peer systems. One
possible application of cryptographic content hashes may in peer identifier
creation process, in which
IP address of peer can be verified by the other peer. This is one form of
\emph{self-certifying data}.
@@ -974,9 +976,9 @@
Peer-to-Peer system. Let's consider anonymity and efficient data lookup. In
efficient lookup, we must know
the peers responsible to given data in Peer-to-Peer system. Of course, when we
know the peers responsible
for the data, the anonymity of peer is lost. Fortunately, there are partial
solutions to previously
-mentioned situations, i.e. pseudonym which is a partial form of anonymity. For
instance, pseudonym can used for
-addressing peer-anonymity by providing anonymous-like identifiers to peers
(e.g., tightly structured peer
-identifiers).
+mentioned situations, i.e. \emph{pseudonym} which is a partial form of
anonymity. For instance, pseudonym can used for
+addressing peer-anonymity by providing anonymous-like identifiers to peers
(e.g., tightly structured system's
+peer identifiers).
Anonymity is widely used in those Peer-to-Peer system in which data
publication and non-censorship are important properties
of the system. These include
@@ -984,7 +986,7 @@
Tangler \cite{502002} and upcoming Mnet \cite{mneturl}. Forwarding proxies are
used in Freenet, Crowds and
Free Haven in order to provide various types of anonymity. Tangler and Publius
uses cryptographic
sharing methods to split a data into data fragments \cite{Shamir1979a}.
Mixmailer networks, such as
-\cite{mixminionurl}, are commonly used in distributed systems, which are able
to provide level
+\cite{mixminionurl}, are commonly used in distributed systems, which are able
to provide some level
of anonymity
Even if many existing Peer-to-Peer systems are able to provide some of the
types of anonymity, there is no
@@ -992,18 +994,17 @@
between anonymity and other Peer-to-Peer system properties requires more
research work.
-\subsection{Access Control}
+\subsection{Access control}
Any distributed computing system must support different levels of access
control. For instance, we may
-want to restrict the accessibility of data to only limited amount of
participating peers. Currently,
-Peer-to-Peer systems doesn't support working, trusted and distributed access
control scheme. Moreover,
-there has been a lot of violation of copyright laws by users of Peer-to-Peer
filesharing systems. As a consequence, some
-lawsuits has been created againts the companies how have build popular
file-sharing programs.
+want to restrict the accessibility of data to only limited amount of
participating peers. Peer-to-Peer
+systems doesn't support working and distributed access control scheme.
Moreover,
+there has been a lot of violation of copyright laws by users of Peer-to-Peer
filesharing systems. As a
+consequence, some lawsuits has been created againts the companies how have
build popular file-sharing programs.
-To our knowledge, Nejdl et al \cite{nejdl03accesscontrol} have proposed first
practical solution to access
+To our knowledge, Nejdl et al \cite{nejdl03accesscontrol} have proposed very
recently first practical solution to access
control problem in Peer-to-Peer systems. They use RDF-based schema policies to
restrict access to certain
-data. To be distributed system feasible, there must be way of control.
Unfortunately, their solution
-works only in loosely structured systems.
+data. Unfortunately, their current prototype works only in loosely structured
systems.
\subsection{Hostile entities}
@@ -1012,13 +1013,13 @@
Possible solutions include self-monitoring systems \cite{zhang03somo},
maintaining system invariants as
proposed in \cite{sit02securitycons}, distributed and secure peer identifier
assignment
\cite{castro02securerouting}, \cite{clarke00freenet} and self-certifying data
using cryptographic
-content hashes (e.g., SHA-1). Identification of hostile entities is essential
in tightly structured
+content hashes (e.g., SHA-1 \cite{fips-sha-1}). Identification of hostile
entities is essential in tightly structured
approach, in which fundamental (and implicit) assumption is that there is
random, uniform distribution
of peer identifiers that cannot be controlled by hostile entity.
Of course centralized authorities could be used for assignment of peer
identifiers, but they have
property of single point of failure. Furthermore, distributed peer
identification assignment can
-be problematic as long as Sybil attack remains unsolved. However, there are
some partial solutions
+be problematic as long as Sybil attack \cite{douceur02sybil} remains unsolved.
However, there are some partial solutions
for controlling the rate at which and hostile entity is able to obtain peer
identifier, such as crypto-based
puzzles \cite{juels99clientpuzzles}.
@@ -1026,7 +1027,7 @@
efficient way. More research is required to solve this problems.
-\subsection{Secure Query Routing}
+\subsection{Secure query routing}
Much work has been done on secure routing, especially in tightly structured
systems. In
\cite{castro02securitystructured} and \cite{castro02securerouting}, authors
suggests the usage
@@ -1036,8 +1037,8 @@
correct peers, when a fraction $f$ of the other peers are faulty or hostile,
is only $(1-f)^{h-1}$.
Sit and Morris \cite{sit02securitycons} discuss the possibility of allowing
query originator
-to observe lookup progress and cross-check routing tables using random
queries. However, Sit and
-Morris approach is not very efficient, since proposals create a lot of
additional network traffic when
+to observe lookup progress and cross-check routing tables using random
queries. However, their
+ approach is not very efficient, since proposals create a lot of additional
network traffic when
in function.
Additionally, Lynch et al. \cite{lynch02atomicdataaccess} propose a solution
to secure routing table
@@ -1048,25 +1049,30 @@
Aspnes et al in \cite{aspnes02faultrouting} and Kaashoek et all in
\cite{kaashoek03koorde} formally
prove the lower and upper bounds for space requirements of locating a specific
date item in
-Peer-to-Peer system. They show that to provide high degree of fault tolerance
efficiency, a peer
-must maintain $O(\log{n})$ neighbors. In addition, most existing
+Peer-to-Peer system. They show that to provide high degree of fault tolerance
and efficiency, each
+participating peer must maintain $O(\log{n})$ neighbors.
Fiat et al in \cite{fiat02censorship}, \cite{saia02dynamicfaultcontentnetwork}
and Datar in \cite{datar02butterflies}
describe tightly structured overlay with analytical results in the presence of
hostile entities. However,
none of these proposals doesn't address an efficient, dynamic tightly
structured overlay and multiple rounds
-of hostile attack. Indeed, above mentioned solutions are not very efficient.
In Fiat et al, each node
-must maintain information of $O(\log^3{n})$ other peers, and in Datar
$O(\log^3{n})$ is required.
+of hostile attack. Also, above mentioned propsals are not very efficient. In
\cite{fiat02censorship}, each node
+must maintain information of $O(\log^3{n})$ other peers, and in
\cite{datar02butterflies}, $O(\log^2{n})$ is required.
Finally, Ratnasamy and Gavoille \cite{ratnasamy02routing},
\cite{gavoille01routing} list several open problems
-regarding routing in distributed networks. Obviously, more research is
required for make secure
-routing possible in Peer-to-Peer networks.
+regarding routing in distributed networks. Obviously, more research is
required in order to provde secure
+data lookup routing possible in Peer-to-Peer networks.
-\subsection{Other Security threats}
+\subsection{Other security threats}
Ross Lee graham lists several external threats againts Peer-to-Peer networks
\cite{grahamp2psecurity}. The list
includes viruses, trojans and bugs in Peer-to-Peer software. Currently, there
are not even partial solutions
-to the problems mentioned above.
+to the problems mentioned above. General robustness properties of Peer-to-Peer
system is able to
+deal with software failures and hostile attack, but redundancy againts
external threats is unknown.
+The reason for this is that there are no experiences on these kinds of
attacks. Possible solution
+would be distributed anti-virus software, but much more intensive research is
required for solve these problems.
+
+
\section{Performance and usability problems in Peer-to-Peer}
@@ -1089,13 +1095,13 @@
is similar to iterative deepening techique. With these techniques, search
may not be fast when desired data item requires many consecutive flooding
rounds.
-Directed breadt-first search \cite{yang02improvingsearch} optimizes the
original
-breadt-first search in way that peer selects neighbors with many quality
results
+Directed BFS \cite{yang02improvingsearch} optimizes the original
+BFS in way that peer selects neighbors with many quality results
may be reached, thereby maintaining the the quality of costs and decreasing
the amount
of messages sent to network. Alpine \cite{alpineurl} and NeuroGrid
\cite{joseph02neurogrid}
-Peer-to-Peer system use somewhat similar method when performing data lookups.
+are Peer-to-Peer system use somewhat similar method when performing data
lookups.
-Local indices \cite{yang02improvingsearch} in one variation of active caching.
+Local indices \cite{yang02improvingsearch} is one variation of active caching.
In this scheme, each peer maintains an index over the data of all nodes within
$h$ hops of itself, where $h$ is a system-wide variable, called radius of the
index\footnote{In normal BFS case, the value of $h$ is 0, as peer only has
index
@@ -1117,9 +1123,10 @@
Since tightly structured systems have efficient data lookup at the application
level overlay,
current research efforts are focused on proximity based data lookup. In
proximity based data lookup,
-peers try to choose routing-tables refering to other peers that are
\emph{nearby} in the
+peers try to choose routing-tables entries refering to other peers that are
\emph{nearby} in the
underlying network. In this way, tightly structured systems are able to
decrease actual
-lookup \emph{latency}. CAN, Kademlia, Pastry and Tapestry have a advanced
heuristics for
+lookup \emph{latency}. CAN \cite{ratnasamy01can}, Kademlia
\cite{maymounkov02kademlia},
+Pastry \cite{rowston01pastry} and Tapestry \cite{zhao01tapestry} have a
advanced heuristics for
proximity based routing. Additionally, most recent version of Chord uses
proximity based
routing inspired by Karger and Ruhl \cite{karger02findingnearest}. Skipnet
\cite{harvey03skipnet1}
uses combination of proximity and application level overlay routing when
performing data
@@ -1133,22 +1140,22 @@
\subsection{Fast and usable search}
-To make Peer-to-Peer systems usable in a large, these systems have to support
flexible, efficient
+To make Peer-to-Peer systems even more popular (and usable), these systems
have to support flexible, efficient
and easy to use search methods. For instance, Internet's perhaps the most
important feature
is the ability to perform keyword or fuzzy searches (e.g., Google). Currently,
only loosely
structured systems are able carry out this requirement. Unfortunately, as
discussed in this text,
-the data loouk model of loosely structured approach is not scalable. Thus,
research efforts have
+the data lookup model of loosely structured approach is not scalable. Thus,
research efforts have
been focused on tightly structured approach.
The main in problem with tightly structured approach is the fact that tightly
structured algorihms
-performs data lookups based on a unique identifier. However, quite recently
have been studies
+performs data lookups based on a globally unique identifier. Quite recent
study has been focused
on the feasibility of Peer-to-Peer Web-like indexing and searching
\cite{li03feasibility}. Authors
argue, that it is possible to implement Peer-to-Peer Web-like search with
certain radical compromises.
First, Peer-to-Peer search enginge may need to decrease result quality in
order make searching more
efficient. Second, Peer-to-Peer systems must observe better the properties of
underlying network for
better performance.
-Some studies have been concentraded on SQL-like queries \cite{harren02complex},
-in tightly structured overlays. Another approaches include adapting loosely
structured approache's
+Some studies have been concentraded on SQL-like queries \cite{harren02complex}
+in tightly structured overlays. Another approaches includes adapting loosely
structured approache's
data lookup model into tightly structured systems
\cite{ansaryefficientbroadcast03}, \cite{chord:om_p-meng}.
Additional studies include additional layer upon overlay network
\cite{kronfol02fasdsearch},
\cite{joseph02p2players} and range queries \cite{andrzejak02rangequeries}.
@@ -1157,9 +1164,9 @@
studies queries follow Zipf-like distributions \cite{breslau98implications}
caching and precomputation
can be done for optimizting search indices \cite{li03feasibility}. Regular
compression algorithms,
Bloom filters \cite{362692}, vector space models
\cite{CuencaAcuna2002DSIWorkshop} and view
-trees \cite{Bhattacharjee03resultcache} can be used for even better
optimizations. In addition, authors
+trees \cite{Bhattacharjee03resultcache} can be used for even better
optimizations. Authors
in \cite{li03feasibility} use Gap compression \cite{wittengigabytes}, Adaptive
Set Intersection \cite{338634}
-and Clustering with their search optimizations.
+and clustering with their search optimizations.
While it is expected that web-like searches can be layered on top of tightly
structured overlay, much
more research is required to make indexing and searching more efficient.
@@ -1176,18 +1183,18 @@
Peer-to-Peer system is \emph{never} in ''ideal'' state as it is always
evolving system.
Current research has been focused on tightly structured systems' system
management, since all presented
-algorithms have been analyzed under static simulation environments.
Furthermore, propsed tightly structured
+tightly structured approache's algorithms have been analyzed under static
simulation environments. Furthermore, propsed tightly structured
overlays are configured statically to achieve the desired reliability even in
uncommon and adverse environment
\cite{rowston03controlloingreliability}. The most important factor for
future research is to get real-life experiences from tightly structured
system, when there are frequent
joins and leaves in the system. Some research has been done already in this
area.
A concept of ''half-life'' was introduced by Liben-Nowell
\cite{libennowell01observations}. Half-life is defined
-as follows: let there be N live nodes at time t. The doubling from time t is
the time that pass before
-N new additional nodes arrive into the system. The halving time from time t is
the time
-requires for half of the living nodes at time t to leave the system. The
half-life from
-time t is smaller of the properties stated above. The half-life of the entire
system is the
-minimum half-life overl all times t. Concept of half-time can be used as basic
for developing
+as follows: let there be $N$ live nodes at time $t$. The doubling from time
$t$ is the time that pass before
+$N$ new additional nodes arrive into the system. The halving time from time
$t$ is the time
+requires for half of the living nodes at time $t$ to leave the system. The
half-life from
+time $t$ is smaller of the properties stated above. The half-life of the
entire system is the
+minimum half-life over all times $t$. Concept of half-time can be used as
basic for developing
more efficient analytical tools for modelling complex Peer-to-Peer system.
Some research has been done with regard to load balancing properties of
tightly structured
@@ -1224,7 +1231,8 @@
describe a arbitrary data structure on top of tightly structured overlay
\cite{zhang03somo}. They
call their proposal as \emph{data overlay}, since it support several
fundamental data structures.
Authors use this data overlay to build Self-Organized Metadata Overlay (SOMO),
which can be used
-for monitoring health of tightly structured overlay.
+for monitoring health of tightly structured overlay. Fault tolerance of SOMO
itself is currently
+unknown.
\section{Miscellaneous problems in Peer-to-Peer}
@@ -1243,26 +1251,26 @@
\subsection{Social behaviour}
-Very frequent assumption in Peer-to-Peer systems is that peers are willing to
cooperate. Another belief
+Frequent assumption in Peer-to-Peer systems is that peers are willing to
cooperate. Another belief
is that all peers would behave equally, i.e. all peers both consume resources
and contributes resources.
However, these assumptions are not true as several studies show peers rather
consume than contribute and
and peers are unwilling to cooperate \cite{saroiu02measurementstudyp2p},
\cite{oram01harnessingpower},
\cite{hearn02mojonation}.
-Somewhat surprisingly little research has been in this area, especiaclly when
considering
+Somewhat surprisingly little research has been in this area, especially when
considering
the possible impact of this \emph{unwanted socical behaviour} to performance
of Peer-to-Peer
-system. However, problem is addressed by Golle et al.
\cite{golle01incentivesp2p}. Some
+system. Problem is addressed by Golle et al. \cite{golle01incentivesp2p}. Some
research has been focused on semantic properties of the overlay in order to
increase
cooperation among participating peers \cite{crespo02semanticoverlay}.
Ramanathan et al.
\cite{ramanathan02goodpeers} and Bernstein et al. \cite{bernstein03selection}
use
empirical metrics and decision trees when teaching peers to make better
decisions
-when contacting other peers in Peer-to-Peer system. Alpine is an example of
+when contacting other peers in Peer-to-Peer system. Alpine \cite{alpineurl} is
an example of
Peer-to-Peer system, which uses empirical metrics for peer selection.
\subsection{Simulating the Peer-to-Peer system}
-Very little research has been done on simulating the global Peer-to-Peer
system. Presumably, this
+Very little research has been done on simulating the \emph{global}
Peer-to-Peer system. Presumably, this
is due to complex nature of Peer-to-Peer system, which makes comprehensive
simulations very
diffucult. Floyd et al. has been studying the simulation of the Internet in
\cite{504642}. Authors
state that simulating the Internet is very challenging task, because of
Internet's heterogeneity
@@ -1271,7 +1279,8 @@
As long as global simulations of Peer-to-Peer systems are lacking, we cannot
we make any further
analysis e.g., on usage patterns in Peer-to-Peer systems. Presumedly, however,
we can assume that
-queries follow the Zipf-like distributions both in the Internet and in
Peer-to-Peer systems.
+, e.g., query keywords follow the Zipf-like distributions
\cite{breslau98implications} both in the
+Internet and in Peer-to-Peer systems.
\section{Summary}
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., (continued)
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/03
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/03
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/03
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/03
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/03
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/04
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/04
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/04
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/04
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/04
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert...,
Hermanni Hyytiälä <=
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/05
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/05
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/05
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/05
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/05
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/05
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/05
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert..., Hermanni Hyytiälä, 2003/03/05