[Gzz] Updated (initial) answer to a)

gzz-dev
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gzz] Updated (initial) answer to a)

From:	hemppah
Subject:	[Gzz] Updated (initial) answer to a)
Date:	Wed, 18 Dec 2002 14:34:28 +0200
User-agent:	Internet Messaging Program (IMP) 3.1
Here's an updated answer for research problem a). Of course, major of these
things applies to b) and c).


-the most efficient algorithm: O(log n)
        -DHTs requires O(log n) hops, log n neighbors, except Viceroy (however,
robustness suffers)
        -SWNs requires O(log^2 n) hops, much less neighbors (constant, is not 
depedent of n)
-in DHTs and SWNs, it is possible to locate data in constant time, BUT it
requires knowing all n neighbors!!!
-in basic scenario, DHTs and SWTs assume that we have to know value's key 
beforehand
-in this case, DHTs and SWNs require little bandwidth, because "network knows
where the block resides"
-in this case, FBNs, Hybrids, Social Discovery require much bandwidth, since
there is no benefit from the block's ID at all (they have to ask constantly)
-SWNs and FBSs require less memory than DHTs, since in these approaches, there
are less connections to other nodes
-SWNs relies less to neighbor nodes than DHTs
-there are different kinds of uses of DHTs, e.g. DHT actually stores the data,
or DHT only stores 
the values, which are used for locating the data
-existing systems mainly uses the latter method
-both have pros and cons: 
        -in the value use, several hostile nodes might say that they hosts the 
value, but
refuse to serve any host
        -in the storage use, hostile node might say that someone must save data 
block
size of 1TB for her computer
        -however, there are methods for preventing these kinds of actions (look 
below)
-DHTs and SWNs are very robust to failures: e.g. 20% of the nodes simultaneously
fail, less than 0.1% of the data retrievals are failed 
(each node has "enough neighbors for alternative routing path)
-however, in other approaches, the network infracstructure is very vulnerable:
usually FBNs/hybrid networks follows the power-law distribution, 
in which, there are many nodes connected to one node. So, when this "super node"
is under DoS attach, the performance of the network suffers a lot --> 
most of data retrievals could fail
-when locating data blocks, the basic difference between DHTs/SWNs is that in
DHT/SWN approach, systems "knows", where the desired data block resides
        -based on data block ID's, nodes gives "hints" (routes more close to ID 
in the
key space) constantly to a query lookup until 
        the specific node is found which hosts the block
        -no extra network traffic
-so in a way, DHT/SWNs are structured entities
-instead, in other approaches, system doesn't "know" where the data block 
resides
        -we have to ask from any participating node, if they have the data block
        -very much extra network traffic
-proposol: most efficient approaches for this research problem are DHTs and
SWNs, since they are based on key-value pairs (Storm has a key as block ID)
-research challenges for DHTs/SWNs:
        -handling failures
                -replication            
                -caching
        -coping with system in flux
                -large system may never achieve a ideal state
                -no overwhelming theory for handling system in flux
        -robustness with untrusted participants
                -data authentication
                -self-certifying pathnames
                -integrity of routing tables            
        -network-awareness for performance
                -topology-awareness
                -traffic-awareness
-there are known security considerations related to DHTs/SWNs, but there are
also (partial) solutions to them:
                -incorrect lookup routing
                        -problem: an individual malicious node could forward 
lookups to an incorrect or
non-existent node
                        -solution: at each hop, querier knows that the lookup 
is supposed to get "closer".
The querier should check this so that
                                   this attack can be detected and backtrack as 
necessary in routing path
                -incorrect routing updates
                        -problem: an individual malicious node could corrupt 
the routing tables of other
nodes by sending incorrect updates
                        -solution: system maintains information of requirements 
of correct routing tables
(and verifies them)                     
                -partition
                        -problem: A new node may join parallel network, formed 
a set of malicious nodes
                        -solution: Node bootstrap via some sort of trusted 
source (e.g. history information
of trusted nodes)
                -storage and retrieval attacks
                        -problem: an individual malicious node could join in 
the lookup correctly, but deny
the existence of data it is responsible for
                        -solution: replication: storage layer must implement 
replication in a way so that
no single node is responsible for replication
                -inconsistent behaviour
                        -problem: an individual malicious node could present a 
good face to only its
neighbors, so that neighbors don't see any reason 
                                  to remove node from their routing tables.
                        -solution: (partial solution) most DHT systems have 
ways of jumping to distant
points in the identifier space in order to speed up queris and
                                   jump over the "bad" neighborhood 
(backtracking and jump over)
                -overload of targeted nodes
                        -problem: an individual malicious node could attempt to 
overload targetted nodes
with garbage packets
                        -solution: (partial solution) data replication in 
physichally disparate locations 
                -rapid joins and leaves
                        -problem: an individual malicious node could trick the 
system with rapid joins and
leaves for extra rebalancing of the system
                        -solution: (partial solution) The rate of replication 
and amount of data stored at
each node must ne kept at the levels that allow 
                                   for timely replication without causing 
network overload when even regular nodes
join and leave the network
                -unsolicited messages
                        -problem: an individual malicious node could send an 
unsolicited response to a
query (man in the middle attack)
                        -solution: (partial solution) the defence against this 
would be standard
authentication techique, such as digital signatures. However, 
                                   there are no working implementation for this 
yet (digital signatures are
expensive currently)


-Hermanni

-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/
[Prev in Thread]
Current Thread
[Next in Thread]
[Gzz] Updated (initial) answer to a), hemppah <=
Prev by Date: Re: [Gzz] hemppah's research problems document
Next by Date: [Gzz] Asko 2002-12-18 (umldoc)
Previous by thread: [Gzz] Some initial answers to a)...c)
Next by thread: [Gzz] PEG vobscene interp
Index(es):
- Date
- Thread