gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert...


From: Hermanni Hyytiälä
Subject: [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert...
Date: Thu, 20 Feb 2003 07:58:12 -0500

CVSROOT:        /cvsroot/gzz
Module name:    gzz
Changes by:     Hermanni Hyytiälä <address@hidden>      03/02/20 07:58:06

Modified files:
        Documentation/misc/hemppah-progradu: masterthesis.tex 

Log message:
        More text

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/gzz/Documentation/misc/hemppah-progradu/masterthesis.tex.diff?tr1=1.55&tr2=1.56&r1=text&r2=text

Patches:
Index: gzz/Documentation/misc/hemppah-progradu/masterthesis.tex
diff -u gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.55 
gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.56
--- gzz/Documentation/misc/hemppah-progradu/masterthesis.tex:1.55       Thu Feb 
20 06:57:32 2003
+++ gzz/Documentation/misc/hemppah-progradu/masterthesis.tex    Thu Feb 20 
07:58:06 2003
@@ -159,6 +159,8 @@
 \section{Loosely structured}
 
 
+power-law disribution
+
 +own resources are not mapped into the network
 +keyword/fuzzy search possible
 -based on FBNt technique, 
@@ -1570,7 +1572,142 @@
 -current p2p systems don't support all of these properties together
 
 
-\section{Possible problems}
+\section{Evaluation}
+
+2.1.5. Answers to research problem
+-the most efficient algorithm: O(log n)
+       -DHTs requires O(log n) hops, log n neighbors, except Viceroy (however, 
robustness suffers)
+       %-SWNs requires O(log^2 n) hops, much less neighbors (constant, is not 
depedent of n)
+-in DHTs and SWNs, it is possible to locate data in constant time, BUT it 
requires knowing all n neighbors!!!
+-in basic scenario, DHTs and SWTs assume that we have to know value's key 
beforehand
+-in this case, DHTs and SWNs require little bandwidth, because "network knows 
where the block resides"
+-in this case, FBNs, Hybrids, Social Discovery require much bandwidth, since 
there is no benefit from the block's ID at all (they have to ask constantly)
+-SWNs and FBSs require less memory than DHTs, since in these approaches, there 
are less connections to other nodes
+-SWNs relies less to neighbor nodes than DHTs
+-there are different kinds of uses of DHTs, e.g. DHT actually stores the data, 
or DHT only stores 
+the values, which are used for locating the data
+-existing systems mainly uses the latter method
+-both have pros and cons: 
+       -in the value use, several hostile nodes might say that they hosts the 
value, but refuse to serve any host
+       -in the storage use, hostile node might say that someone must save data 
block size of 1TB for her computer
+       -however, there are methods for preventing these kinds of actions (look 
below)
+-DHTs and SWNs are very robust to failures: e.g. 20% of the nodes 
simultaneously fail, less than 0.1% of the data retrievals are failed 
+(each node has "enough neighbors for alternative routing path)
+-however, in other approaches, the network infracstructure is very vulnerable: 
usually FBNs/hybrid networks follows the power-law distribution, 
+in which, there are many nodes connected to one node. So, when this "super 
node" is under DoS attach, the performance of the network suffers a lot --> 
+most of data retrievals could fail
+-when locating data blocks, the basic difference between DHTs/SWNs is that in 
DHT/SWN approach, systems "knows", where the desired data block resides
+       -based on data block ID's, nodes gives "hints" (routes more close to ID 
in the key space) constantly to a query lookup until 
+       the specific node is found which hosts the block
+       -no extra network traffic
+-so in a way, DHT/SWNs are structured entities
+-instead, in other approaches, system doesn't "know" where the data block 
resides
+       -we have to ask from any participating node, if they have the data block
+       -very much extra network traffic
+-proposol: most efficient approaches for this research problem are DHTs and 
SWNs, since they are based on key-value pairs (Storm has a key as block ID)
+-research challenges for DHTs/SWNs:
+
+
+2.2.3. Existing approaches and this specific research problem
+
+-First, *all* blocks (with the specific urn-5 name) have to checked and 
analysed 
+-In this case (urn-5), there is no benefit from the block's ID at all (DHT, 
SWN)
+
+2.2.4. Open questions:
+-in this case, is it sensible to maintain two different key-value mappings 
key-value based systems (DHT, SWN) ?l
+       -one fore block IDs
+       -one for urn-5 names, which are associated with block IDs
+       -is this approach too difficult to maintain ?
+
+-is there possibility, in the specific urn-5 name, to maintain information 
about most recent block's ID for better search performance (or moreover, 
+tree/list based structure for all blocks for specific urn-5 name) ?
+-How CAs' data should be saved ?
+
+2.2.5. Answers to research problem
+-compared to previous research problem, the "obvious" benefits of DHTs and 
SWNs in this research problem are smaller
+-we don't beforehand know which block is the most recent associated with a 
specfic urn-5 name
+-in theory, though, the most efficient algorithm is O(log n), *assuming* that 
we already know the most recent block's ID (see previous research problem)
+-the most important question is, how we can efficiently associate urn-5 names 
with the most recent block / to all blocks which are associated with it
+-for DHTs and SWNs:
+
+Please notice: In this approach, DHT doesn't store the actual block, only the 
values for locating the data from the system
+
+       Req. 1:
+       -each node maintains a local hash-table based data structure (urn-5 
name -> most recent local block ID) for every urn-5 names
+       which node hosts. The most recent block is topmost --> we don't have to 
check all blocks and their urn-5 associations to get the most recent
+       Req. 2:
+       -all urn-5 name mappings are stored as <key, value[ ]>, where the key 
is urn-5 name's hash and value is a record containing
+       block ID and timestamp of that block. So, when we store a block in our 
system first time, we have to create a new key-value: 
+       %<hash_of_urn_5_name, [block id, block timestamp]> and route this 
mapping to node which is "closest" to a hash value. Now when we want to find 
the most 
+       recent block associated with a specific urn-5 name, we do:
+       
+               1. locally compute a hash for given urn-5 string
+               2. route the query to a node which hosts the given hash of 
urn-5 ("closest" node)
+               3. get most recent block's ID using the idea from previuos idea
+               4. use ID to get the specific block using normal DHT/SWN 
operation
+               
+       In this approach, we don't have to perform additional searching and 
sorting of mappings. And of course, we know that for given urn-5, only one node
+       hosts *all* the block information ("block history") for the urn-5, 
since mappings are mapped to a single node, closest to urn-5 hash value.
+       Again, this should work fine under existing DHTs (and SWTs ?).
+       
+       Some simple analysis: 
+       -there are more key-value pairs in the system for additional urn-5 --> 
block associations
+       -however, I don't think this is an issue, since data's size is small
+       -efficiency: find node which hosts urn-5 names + find node which hosts 
blocks associated with urn-5 name: logn + logn = 2logn (logarithmical)
+       
+-for FBS and others:
+       -there is no very efficient (simple) methods for finding urn-5 name 
associated with the most recent block
+       -one simple proposal is that when we visit to each node (first idea 
above), get only the most recent one and compare them (or greedy approach: 
dismiss currently 
+       most recent block as we visit to nodes, if newer block have been found)
+
+       
+       2.3.3. Existing approaches and this specific research problem
+
+-First, *all* blocks (with the specific urn-5 name) have to checked and 
analysed 
+-In this case (urn-5), there is no benefit from the block's ID at all (DHT, 
SWN)
+
+2.3.4 Open questions:
+-in this case, is it sensible to maintain two different key-value mappings 
key-value based systems ?
+       -one mapping fore block IDs
+       -one mapping for urn-5 names, which are associated with block IDs
+       -is this approach too difficult to maintain ?
+       
+-is there possibility, in the specific urn-5 name, to maintain information 
about most recent block's ID for better search performance (or moreover, 
+tree based structure for all blocks for specific urn-5 name) ?
+-How CAs' data should be saved ?
+
+2.3.5. Answers to research problem
+This is quite similar to previous research problem's answer. There are slight 
differences, though.
+
+-for DHTs and SWNs:
+
+Please notice: In this approach, DHT doesn't store the actual block, only the 
values for locating the data from the system
+
+       Req. 1:
+       -each node maintains a local hash-table based data structure (urn-5 
name -> most recent local block ID) for every urn-5 names
+       which node hosts. The most recent block is topmost --> we don't have to 
check all blocks and their urn-5 associations to get the most recent
+       Req. 2:
+       -all urn-5 name mappings are stored as <key, value[ ]>, where the key 
is urn-5 name's hash and value is a record containing
+       block ID and timestamp of that block. So, when we store a block in our 
system first time, we have to create a new key-value: 
+       %<hash_of_urn_5_name, [block id, block timestamp]> and route this 
mapping to node which is "closest" to a hash value. Now when we want to find 
the most 
+       recent block associated with a specific urn-5 name, we do:      
+       
+               1. locally compute a hash for given urn-5 string
+               2. route the query to a node which hosts the given hash of 
urn-5 ("closest" node)
+               %3. get the specific block(s) data (block ID) from the stack 
which matches to given date&time properties (we compare to block header data)
+               4. use IDs to get the specific blocks using normal DHT/SWN 
operation            
+       
+       Some simple analysis: 
+       -there are more key-value pairs in the system for additional urn-5 --> 
block associations
+       -however, I don't think this is an issue, since data's size is small
+       -efficiency: find node which hosts urn-5 names + find node which hosts 
blocks associated with urn-5 name: logn + logn = 2logn (logarithmical)
+       
+-for FBS and others:
+       -there is no very efficient (simple) methods for finding urn-5 name 
associated with the most recent block
+       -one simple proposal is that when we be that we visit to each node 
(first idea above), get all blocks which matches to given properties
+      
+
+\section{Open issues and future work}
 
 \cite{gribble01p2pdatabase}
 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]