[Gzz] hemppah's research problems document

gzz-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz] hemppah's research problems document

From:	B. Fallenstein
Subject:	[Gzz] hemppah's research problems document
Date:	Fri, 13 Dec 2002 00:52:00 +0100
User-agent:	Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.2) Gecko/20021204 Debian/1.2.1-1

Please notice: in this section + = pro, - = con


Probably this'll make diffs to that section hard to read...

1.1. Distributed Hash Tables (DHT)

...

-own resources are mapped into the network

I still don't see this as a problem. You should have an explanation ifyou do... :)

-keyword/fuzzy search not possible yet
-hotspots

-Reliability depends on replicas. If it is possible for an adversary tocontrol a sufficient number of replicas for a given key, they can spoofthat key easily (both giving wrong data and hiding correct data).

-DHTs require O(log n) hops to reach arbitrary destinations, assuming that each node maintainsinformation about O(log n) nodes

This is not true. *Some* DHT routing algorithms require O(log n) nodesto be stored, but CAN or a small-world assumption-based routingalgorithm only require O(1) neighbours (where CAN routing takes squareroot time). You say that yourself, below.

-Of couse, there is the possibility to route in constant times, but it requires 
that *each** node
maintains information about all the nodes in the network. Therefore , practically, this methodimpossible

(Not 100% correct: for example, it suffices to have an O(1) route to onecomputer that knows each node. But you're right about constant timerouting not being workable in any case.)

-Example systems: Chord, CAN, Kademlia, Pastry, Tapestry

GISP and the Circle. Mentioning them here because they contrast yourexamples in not being academic research projects.

1.2. Small World Networks (SWN)

...

-SWNs require O(log^2 n) hops to reach arbitrary destinations, assuming (*only and only if* !!!) thatlinks between nodes are constructed in the way that they are uniformly distributed over all distancesin the network (Kleinberg)


"only and only if"? :)

-Example systems: SWAN, Freenet

Freenet? Freenet definitely does not have the properties of storingmappings on your machine-- in Freenet, *everything* is distributedthrough the network.

1.3. Flooding Broadcast Networks (FBN)

...

-not fast routing (in fact, there is no upper limit, because there are multiple 
simultaneous
breadh-first searches in the network)

I don't understand this (why there is no upper limit). I would'vethought that there's usually an upper limit of O(n^2)-- if all nodesknow each other, and thus every node forwards the request to every othernode (where it is cancelled, because a node will not process the samelookup twice).

                Insert/Delete   Space           Search
Pastry:         O(log^2 n)      O(nlog n)       O(log n)
Tapestry:       O(log^2 n)      O(nlog n)       O(log n)

In Pastry and Tapestry, each node has to know *more neighbours thanthere are nodes in the system* (n * log n)? Ahem.

Flooding:       N/A             N/A             No limit!

Space for flooding is O(1), I presume. You generally set a number in thepreferences how much other computers your computer will connect to.

2. Thesis research problems

2.1  "Searching for a specific Storm block"

2.1.1 Facts
-specific Storm block can be identified with an ID, generated by SHA-1 algorithm
-block ID is globally *unique*, e.g. "Front page of New York Times newspaper on 
10.10.2002"

But note that's it's a *number*, which you cannot immediately see hasanything to do with the New York Times, newspapers, or 10.10.2002.

2.1.2. Objectives
-if block exists in the network, algorithm *will* find it


Yep, exactly.

-algorithm will find the specific Storm block as quicly as possible (and return 
it to the user) based
on the the block's ID
-simple pseudo thinking: "based on block ID, find the specific block fast and return 
it."


You're missing a step here:
1. Find the block
2. *Download the block*
3. Return the block

Maybe you want to leave it out of the pseudo thinking, but it's quiteessential to the point above that :-)

2.1.3. Existing approaches and this specific research problem

a) DHT

-natural choice, since in DHTs are based on key-value associations-in our case, keys are block IDs (SHA-1)

-if exists in the network, specific block will be located based on block ID
-Efficient, O(log n) in existing DHT systems
-approach specific pseudo thinking: "based on block's ID, go to the node, whose ID 
is the nearest to block's ID in key space"


Ok, but--

b) SWN

...

-approach specific pseudo thinking in SWAN: "based on block's ID, go to the node, whose (euclidean) distance is closestto block's ID in identity space"

Here the simplification bites. There are two ways to interpret this'pseudo thinking':

1. This is used as a DHT routing algorithm: that node will tell us whichother node has the block (i.e., we 'map our resources into the network'as you described it above)2. The node we found is the node to download the block from. This meansthat a computer would have to run a node for every block it stores(making number of neighbours to store an extremely unattractive O(k)with k = number of blocks on the local computer).

My problem: If it's 1), this is just another type of DHT. If it's 2),it's about as attractive as flooding broadcast networks, because itsimply doesn't scale.

c) FBN
-In this case, there is no benefit from the block's ID at all


What do you mean by this?

-not very efficient, since every Storm node has to "ask" every neighbor node it 
has the specific block ID
-there is no guarantee that block will be found in the network
-obviously a lot of extra traffic arises in the network
-approach specific pseudo thinking: "ask repeating from each node if it has block, 
whose ID value is X"


Yep.

2.2. "Searching for most recent Storm block associated with specific urn-5 
name, where
     the block has been signed with a given key"

This is not specific to urn-5; any URI used for identification works.For example, the proposed tag URIs (tag:address@hidden:2002-12:fooor some such) would work just as well.

2.2.1. Facts-urn-5 is random "unique keyword", e.g. "Front page of New York Times newspaper"
-urn-5 can be updated

No, a urn-5 cannot be updated. We can associate a new block with theurn-5, but there is no such thing as 'updating' it.

-urn-5 name is saved in the header of Storm block (|block urn-5: "Front page of New 
York Times newspaper"|)

This is not what we have been doing so far. Your approach may have meritthough, especially for network publishing (like 'Front page of NYT'):the one we currently use has some problems with determining what is'current' if we delete some old versions, but not all (even olderversions may suddenly appear to be current).

-finding key blocks for data block can be performed locally (in logarithmic 
time)

- "...can be performed locally"? It's like loading blocks: If you don'thave the data, you must request it from the network.

- Why logarithmic time? Locally, we can use a hashtable -> O(1)?

2.2.2. Objectives
-Find the specific (and the most recent) Storm block as quicly as possible

> (and return it to the user) based on the given urn-5

And key.

-simple pseudo thinking: "find all Storm blocks from the network,

> which uses specific urn-5 name. Compare blocks and

return the most recent block, if the signing key is "valid"."

Well... you would probably only request the most current from thenetwork and check its signature, since if the network is trying to foolyou into believing an old version is the current one, it could simplyhide the blocks containing the newer versions from you in the first place.

2.2.3. Existing approaches and this specific research problem
-First, *all* blocks (with the specific urn-5 name) have to checked and analysed-In this case (urn-5), there is no benefit from the block's ID at all (DHT, SWN)


What do you mean by this? I don't understand at all.

2.2.4. Open questions:
-in this case, is it sensible to maintain two different key-value mappings 
key-value based systems (DHT, SWN) ?
        -one fore block IDs
        -one for urn-5 names, which are associated with block IDs
        -is this approach too difficult to maintain ?


Why would one want to? I don't see any benefits.

-is there possibility, in the specific urn-5 name, to maintain information

> about most recent block's ID for better search performance (or moreover,

tree based structure for all blocks for specific urn-5 name) ?

No. You create the urn-5 *before* you create any of the blocks, so youcannot make it depend on the blocks (that's why we can't usecryptographical hashes).

2.3. "Searching for Storm blocks associated with specific urn-5 name, wherespecific date has been defined (or date range), and where Storm blockhas been signed with a given key"2.3.1. Facts
-in addition to section 2.2.1, in every block's header, there is a timestamp for 
date&time

Huh? This isn't needed in 2.2.1? How would one then compare two blocksto determine which of the two is more current (should be shown)?!?


Hope these comments help,
- Benja

[Prev in Thread]

Current Thread

[Next in Thread]

[Gzz] hemppah's research problems document, B. Fallenstein <=
- Re: [Gzz] hemppah's research problems document, hemppah, 2002/12/13
  - Re: [Gzz] hemppah's research problems document, Tuomas Lukka, 2002/12/13
    - Re: [Gzz] hemppah's research problems document, B. Fallenstein, 2002/12/14
    - Re: [Gzz] hemppah's research problems document, Tuomas Lukka, 2002/12/14
    - Re: [Gzz] hemppah's research problems document, B. Fallenstein, 2002/12/14
    - Re: [Gzz] hemppah's research problems document, Tuomas Lukka, 2002/12/14
    - Re: [Gzz] hemppah's research problems document, B. Fallenstein, 2002/12/14
    - Re: [Gzz] hemppah's research problems document, Tuomas Lukka, 2002/12/15
    - Re: [Gzz] hemppah's research problems document, B. Fallenstein, 2002/12/15
    - Re: [Gzz] hemppah's research problems document, hemppah, 2002/12/16

Prev by Date: [Gzz] Asko 2002-12-12 (pacman)
Next by Date: Re: UML tool (was: Re: [Gzz] Asko 2002-12-04/05 (OvalBgVob))
Previous by thread: [Gzz] Asko 2002-12-12 (pacman)
Next by thread: Re: [Gzz] hemppah's research problems document
Index(es):
- Date
- Thread