gzz-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz] hemppah's research problems document


From: B. Fallenstein
Subject: [Gzz] hemppah's research problems document
Date: Fri, 13 Dec 2002 00:52:00 +0100
User-agent: Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.2) Gecko/20021204 Debian/1.2.1-1

Please notice: in this section + = pro, - = con

Probably this'll make diffs to that section hard to read...

1.1. Distributed Hash Tables (DHT)

...

-own resources are mapped into the network

I still don't see this as a problem. You should have an explanation if you do... :)

-keyword/fuzzy search not possible yet
-hotspots

-Reliability depends on replicas. If it is possible for an adversary to control a sufficient number of replicas for a given key, they can spoof that key easily (both giving wrong data and hiding correct data).

-DHTs require O(log n) hops to reach arbitrary destinations, assuming that each node maintains information about O(log n) nodes

This is not true. *Some* DHT routing algorithms require O(log n) nodes to be stored, but CAN or a small-world assumption-based routing algorithm only require O(1) neighbours (where CAN routing takes square root time). You say that yourself, below.

-Of couse, there is the possibility to route in constant times, but it requires 
that *each** node
maintains information about all the nodes in the network. Therefore , practically, this method impossible

(Not 100% correct: for example, it suffices to have an O(1) route to one computer that knows each node. But you're right about constant time routing not being workable in any case.)

-Example systems: Chord, CAN, Kademlia, Pastry, Tapestry

GISP and the Circle. Mentioning them here because they contrast your examples in not being academic research projects.

1.2. Small World Networks (SWN)
...
-SWNs require O(log^2 n) hops to reach arbitrary destinations, assuming (*only and only if* !!!) that links between nodes are constructed in the way that they are uniformly distributed over all distances in the network (Kleinberg)

"only and only if"? :)

-Example systems: SWAN, Freenet

Freenet? Freenet definitely does not have the properties of storing mappings on your machine-- in Freenet, *everything* is distributed through the network.

1.3. Flooding Broadcast Networks (FBN)
...
-not fast routing (in fact, there is no upper limit, because there are multiple 
simultaneous
breadh-first searches in the network)

I don't understand this (why there is no upper limit). I would've thought that there's usually an upper limit of O(n^2)-- if all nodes know each other, and thus every node forwards the request to every other node (where it is cancelled, because a node will not process the same lookup twice).

                Insert/Delete   Space           Search
Pastry:         O(log^2 n)      O(nlog n)       O(log n)
Tapestry:       O(log^2 n)      O(nlog n)       O(log n)

In Pastry and Tapestry, each node has to know *more neighbours than there are nodes in the system* (n * log n)? Ahem.

Flooding:       N/A             N/A             No limit!

Space for flooding is O(1), I presume. You generally set a number in the preferences how much other computers your computer will connect to.

2. Thesis research problems

2.1  "Searching for a specific Storm block"

2.1.1 Facts
-specific Storm block can be identified with an ID, generated by SHA-1 algorithm
-block ID is globally *unique*, e.g. "Front page of New York Times newspaper on 
10.10.2002"

But note that's it's a *number*, which you cannot immediately see has anything to do with the New York Times, newspapers, or 10.10.2002.

2.1.2. Objectives
-if block exists in the network, algorithm *will* find it

Yep, exactly.

-algorithm will find the specific Storm block as quicly as possible (and return 
it to the user) based
on the the block's ID
-simple pseudo thinking: "based on block ID, find the specific block fast and return 
it."

You're missing a step here:
1. Find the block
2. *Download the block*
3. Return the block

Maybe you want to leave it out of the pseudo thinking, but it's quite essential to the point above that :-)

2.1.3. Existing approaches and this specific research problem

a) DHT
-natural choice, since in DHTs are based on key-value associations -in our case, keys are block IDs (SHA-1)
-if exists in the network, specific block will be located based on block ID
-Efficient, O(log n) in existing DHT systems
-approach specific pseudo thinking: "based on block's ID, go to the node, whose ID 
is the nearest to block's ID in key space"

Ok, but--

b) SWN
...
-approach specific pseudo thinking in SWAN: "based on block's ID, go to the node, whose (euclidean) distance is closest to block's ID in identity space"

Here the simplification bites. There are two ways to interpret this 'pseudo thinking':

1. This is used as a DHT routing algorithm: that node will tell us which other node has the block (i.e., we 'map our resources into the network' as you described it above) 2. The node we found is the node to download the block from. This means that a computer would have to run a node for every block it stores (making number of neighbours to store an extremely unattractive O(k) with k = number of blocks on the local computer).

My problem: If it's 1), this is just another type of DHT. If it's 2), it's about as attractive as flooding broadcast networks, because it simply doesn't scale.

c) FBN
-In this case, there is no benefit from the block's ID at all

What do you mean by this?

-not very efficient, since every Storm node has to "ask" every neighbor node it 
has the specific block ID
-there is no guarantee that block will be found in the network
-obviously a lot of extra traffic arises in the network
-approach specific pseudo thinking: "ask repeating from each node if it has block, 
whose ID value is X"

Yep.

2.2. "Searching for most recent Storm block associated with specific urn-5 
name, where
     the block has been signed with a given key"

This is not specific to urn-5; any URI used for identification works. For example, the proposed tag URIs (tag:address@hidden:2002-12:foo or some such) would work just as well.

2.2.1. Facts -urn-5 is random "unique keyword", e.g. "Front page of New York Times newspaper"
-urn-5 can be updated

No, a urn-5 cannot be updated. We can associate a new block with the urn-5, but there is no such thing as 'updating' it.

-urn-5 name is saved in the header of Storm block (|block urn-5: "Front page of New 
York Times newspaper"|)

This is not what we have been doing so far. Your approach may have merit though, especially for network publishing (like 'Front page of NYT'): the one we currently use has some problems with determining what is 'current' if we delete some old versions, but not all (even older versions may suddenly appear to be current).

-finding key blocks for data block can be performed locally (in logarithmic 
time)

- "...can be performed locally"? It's like loading blocks: If you don't have the data, you must request it from the network.
- Why logarithmic time? Locally, we can use a hashtable -> O(1)?

2.2.2. Objectives
-Find the specific (and the most recent) Storm block as quicly as possible
> (and return it to the user) based on the given urn-5

And key.

-simple pseudo thinking: "find all Storm blocks from the network,
> which uses specific urn-5 name. Compare blocks and
return the most recent block, if the signing key is "valid"."

Well... you would probably only request the most current from the network and check its signature, since if the network is trying to fool you into believing an old version is the current one, it could simply hide the blocks containing the newer versions from you in the first place.

2.2.3. Existing approaches and this specific research problem

-First, *all* blocks (with the specific urn-5 name) have to checked and analysed -In this case (urn-5), there is no benefit from the block's ID at all (DHT, SWN)

What do you mean by this? I don't understand at all.

2.2.4. Open questions:
-in this case, is it sensible to maintain two different key-value mappings 
key-value based systems (DHT, SWN) ?
        -one fore block IDs
        -one for urn-5 names, which are associated with block IDs
        -is this approach too difficult to maintain ?

Why would one want to? I don't see any benefits.

-is there possibility, in the specific urn-5 name, to maintain information
> about most recent block's ID for better search performance (or moreover,
tree based structure for all blocks for specific urn-5 name) ?

No. You create the urn-5 *before* you create any of the blocks, so you cannot make it depend on the blocks (that's why we can't use cryptographical hashes).

2.3. "Searching for Storm blocks associated with specific urn-5 name, where specific date has been defined (or date range), and where Storm block has been signed with a given key" 2.3.1. Facts
-in addition to section 2.2.1, in every block's header, there is a timestamp for 
date&time

Huh? This isn't needed in 2.2.1? How would one then compare two blocks to determine which of the two is more current (should be shown)?!?

Hope these comments help,
- Benja




reply via email to

[Prev in Thread] Current Thread [Next in Thread]