gzz-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gzz] hemppah's research problems document


From: hemppah
Subject: Re: [Gzz] hemppah's research problems document
Date: Fri, 13 Dec 2002 16:40:03 +0200
User-agent: Internet Messaging Program (IMP) 3.1

First, thank your for your comments. Altough this document is far from complete,
 comments are very appreciated :).

Please look my comments below...

Quoting "B. Fallenstein" <address@hidden>:

> > Please notice: in this section + = pro, - = con
> 
> Probably this'll make diffs to that section hard to read...
> 
> > 1.1. Distributed Hash Tables (DHT)
> 
> ...
> 
> > -own resources are mapped into the network
> 
> I still don't see this as a problem. You should have an explanation if 
> you do... :)

Yes, you are right. This not a *problem*, rather a subjective opinion (in our
scenario). In fact, I discussed about this with Tuomas today we agreed (in our 
scenario), that it's a very important thing that a user has *authority to
decide*, whether to publish data or not (mirroring etc). Tuomas, do you want to
give comments on this ?

> 
> > -keyword/fuzzy search not possible yet
> > -hotspots
> 
> -Reliability depends on replicas. If it is possible for an adversary to 
> control a sufficient number of replicas for a given key, they can spoof 
> that key easily (both giving wrong data and hiding correct data).

Yes, there a lot issues to be solved in DHTs (like manu other approaches). I
decided include a few issues instead of all ;).

> 
> > -DHTs require O(log n) hops to reach arbitrary destinations, assuming that
> each node maintains 
> > information about O(log n) nodes
> 
> This is not true. *Some* DHT routing algorithms require O(log n) nodes 
> to be stored, but CAN or a small-world assumption-based routing 
> algorithm only require O(1) neighbours (where CAN routing takes square 
> root time). You say that yourself, below.

For CAN, though, for only *oneƄ dimension is a "special" case. For CAN, when
there is a only one dimesion, CAN performance is far for other DHTs. Indeed,
authors suggest that setting as d = log n, CAN has O(log n) neighbors and O(log
n) pathlengths like other DHT algorithms.


> > -Example systems: Chord, CAN, Kademlia, Pastry, Tapestry
> 
> GISP and the Circle. Mentioning them here because they contrast your 
> examples in not being academic research projects.

AFAIK, Circle is based on Kademlia. And GISP very similar to Chord. But good
attention ;).

> 
> > 1.2. Small World Networks (SWN)
> ...
> > -SWNs require O(log^2 n) hops to reach arbitrary destinations, assuming
> (*only and only if* !!!) that 
> > links between nodes are constructed in the way that they are uniformly
> distributed over all distances 
> > in the network (Kleinberg)
> 
> "only and only if"? :)

Yes! There have to be *very* careful (aka "clever") when constructing links
between nodes in SWNs. See Jon Kleinberg's works for details (google jon 
kleinberg)

> 
> > -Example systems: SWAN, Freenet
> 
> Freenet? Freenet definitely does not have the properties of storing 
> mappings on your machine-- in Freenet, *everything* is distributed 
> through the network.

Umm..perhaps I should be more specific. Basically I have categorized these
systems based on the routing mechanism. I should, however, update the section
names for better clearance.

> 
> > 1.3. Flooding Broadcast Networks (FBN)
> ...
> > -not fast routing (in fact, there is no upper limit, because there are
> multiple simultaneous
> > breadh-first searches in the network)
> 
> I don't understand this (why there is no upper limit). I would've 
> thought that there's usually an upper limit of O(n^2)-- if all nodes 
> know each other, and thus every node forwards the request to every other 
> node (where it is cancelled, because a node will not process the same 
> lookup twice).

I have to check this. Yesterday while writing this document, I didn't find the
article, where authors said that there is "no limit". Have to check...

> 
> >             Insert/Delete   Space
        Search
> > Pastry:     O(log^2 n)      O(nlog n)       O(log n)
> > Tapestry:   O(log^2 n)      O(nlog n)       O(log n)
> 
> In Pastry and Tapestry, each node has to know *more neighbours than 
> there are nodes in the system* (n * log n)? Ahem.

Ok, in this we have to be careful :). The summary table is far from complete
(from "truth"). However, I copied part of this table from the article named
"Distributed Object Location in a Dynamic Network",

http://oceanstore.cs.berkeley.edu/publications/papers/pdf/SPAA02.pdf.


> 
> > 2. Thesis research problems
> > 
> > 2.1  "Searching for a specific Storm block"
> > 
> > 2.1.1 Facts
> > -specific Storm block can be identified with an ID, generated by SHA-1
> algorithm
> > -block ID is globally *unique*, e.g. "Front page of New York Times
> newspaper on 10.10.2002"
> 
> But note that's it's a *number*, which you cannot immediately see has 
> anything to do with the New York Times, newspapers, or 10.10.2002.

Yes, *random* number.

> 
> > 2.1.2. Objectives

> 
> > -algorithm will find the specific Storm block as quicly as possible (and
> return it to the user) based
> > on the the block's ID
> > -simple pseudo thinking: "based on block ID, find the specific block fast
> and return it."
> 
> You're missing a step here:
> 1. Find the block
> 2. *Download the block*
> 3. Return the block
> 
> Maybe you want to leave it out of the pseudo thinking, but it's quite 
> essential to the point above that :-)

Yes, I left it intention away. AFAIK, thesis' focus is how a specific data can
be located as fast as possible, if exists in the network.


> 
> > b) SWN
> ...
> > -approach specific pseudo thinking in SWAN: "based on block's ID, go to the
> node, whose (euclidean) distance is closest 
> > to block's ID in identity space"
> 
> Here the simplification bites. There are two ways to interpret this 
> 'pseudo thinking':
> 
> 1. This is used as a DHT routing algorithm: that node will tell us which 
> other node has the block (i.e., we 'map our resources into the network' 
> as you described it above)
> 2. The node we found is the node to download the block from. This means 
> that a computer would have to run a node for every block it stores 
> (making number of neighbours to store an extremely unattractive O(k) 
> with k = number of blocks on the local computer).
> 
> My problem: If it's 1), this is just another type of DHT. If it's 2), 
> it's about as attractive as flooding broadcast networks, because it 
> simply doesn't scale.

I think the first approach is more close ("somekind of DHT")...

...however, in SWNs, the most important thing how the structure (links between
nodes) is constructed; so that certain rules are are fulfilled (again, see
Kleingberg). Indeed, SWNs can be seen as structured entity, *but* not as
structured as DHTs.


> 
> > c) FBN
> > -In this case, there is no benefit from the block's ID at all 
> 
> What do you mean by this?

Hm..that there is not "hints" in the network where block is located in the
network. On the other hand, in DHTs (and in somehow in SWNs also) based on block
ID's hash, nodes gives "hints" (routes more close to ID in the key space)
constantly to a query lookup until the specific node is found which hosts the 
block.


> > 2.2.1. Facts     
> > -urn-5 is random "unique keyword", e.g. "Front page of New York Times
> newspaper"
> > -urn-5 can be updated
> 
> No, a urn-5 cannot be updated. We can associate a new block with the 
> urn-5, but there is no such thing as 'updating' it.

Sorry, I meant that. Will fix for better understanding...:).

> 
> > -urn-5 name is saved in the header of Storm block (|block urn-5: "Front
> page of New York Times newspaper"|)
> 
> This is not what we have been doing so far. Your approach may have merit 
> though, especially for network publishing (like 'Front page of NYT'): 
> the one we currently use has some problems with determining what is 
> 'current' if we delete some old versions, but not all (even older 
> versions may suddenly appear to be current).

This "fact" (presumption) is totally based on the discussion between Tuomas and
I in the "gzz yesterday :).

> 
> > -finding key blocks for data block can be performed locally (in logarithmic
> time)
> 
> - "...can be performed locally"? It's like loading blocks: If you don't 
> have the data, you must request it from the network.
> - Why logarithmic time? Locally, we can use a hashtable -> O(1)?

Also, look above :). Ok, to be more specific: worste case is locarithmic time,
usually constant time.


> 
> > -simple pseudo thinking: "find all Storm blocks from the network,
>  > which uses specific urn-5 name. Compare blocks and
> > return the most recent block, if the signing key is "valid"."
> 
> Well... you would probably only request the most current from the 
> network and check its signature, since if the network is trying to fool 
> you into believing an old version is the current one, it could simply 
> hide the blocks containing the newer versions from you in the first place.

How do we *know* to request the most current version from the distributed
network (key ?) ?


> 
> > 2.2.3. Existing approaches and this specific research problem
> > 
> > -First, *all* blocks (with the specific urn-5 name) have to checked and
> analysed 
> > -In this case (urn-5), there is no benefit from the block's ID at all (DHT,
> SWN)
> 
> What do you mean by this? I don't understand at all.

I may have understood this all wrong, but if we want all blocks associated with
a specific urn-5, we have to check all blocks (since from this point of view,
urn-5 and block IDs are independent from each other: urn-5 names has no
information about the blocks which may associated with the specific urn -5 ??) ?

> 
> > 2.2.4. Open questions:
> > -in this case, is it sensible to maintain two different key-value mappings
> key-value based systems (DHT, SWN) ?
> >     -one fore block IDs
> >     -one for urn-5 names, which are associated with block IDs
> >     -is this approach too difficult to maintain ?
> 
> Why would one want to? I don't see any benefits.

This is based on my preasumption below. If we know all the block's associated
with a given urn-5, we can find the block(s) more efficiently.

But as I said, I don't know about this...

> 
> > -is there possibility, in the specific urn-5 name, to maintain
> information
>  > about most recent block's ID for better search performance (or moreover,
> > tree based structure for all blocks for specific urn-5 name) ?
> 
> No. You create the urn-5 *before* you create any of the blocks, so you 
> cannot make it depend on the blocks (that's why we can't use 
> cryptographical hashes).

Heh...here's the answer for my two answers above :).


> 
> > 2.3. "Searching for Storm blocks associated with specific urn-5 name, where
> 
> >      specific date has been defined (or date range), and where Storm block
> 
> >      has been signed with a given key"
> >      
> > 2.3.1. Facts
> > -in addition to section 2.2.1, in every block's header, there is a
> timestamp for date&time
> 
> Huh? This isn't needed in 2.2.1? How would one then compare two blocks 
> to determine which of the two is more current (should be shown)?!?

Ups...yes of course! Will fix it...


-Hermanni





-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]