gzz-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gzz] hemppah's research problems document


From: B. Fallenstein
Subject: Re: [Gzz] hemppah's research problems document
Date: Sat, 14 Dec 2002 21:08:22 +0100
User-agent: Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.2) Gecko/20021204 Debian/1.2.1-1


Hi Hermanni,

we seem to be miscommunicating again and again... sorry. *sigh* Please, be as explicit as possible when answering, to avoid misunderstandings... :-/

address@hidden wrote:
-DHTs require O(log n) hops to reach arbitrary destinations, assuming that

each node maintains
information about O(log n) nodes

This is not true. *Some* DHT routing algorithms require O(log n) nodes to be stored, but CAN or a small-world assumption-based routing algorithm only require O(1) neighbours (where CAN routing takes square root time). You say that yourself, below.


For CAN, though, for only *oneƄ dimension is a "special" case.  For CAN, when
there is a only one dimesion, CAN performance is far for other DHTs.

I don't understand these two sentences...

Indeed,
authors suggest that setting as d = log n, CAN has O(log n) neighbors and O(log
n) pathlengths like other DHT algorithms.

So? That still means that *some* DHT algorithms perform as you said, *but not all* (CAN with say, 3 dimensions).

-SWNs require O(log^2 n) hops to reach arbitrary destinations, assuming
(*only and only if* !!!) that links between nodes are constructed in the way that they are uniformly distributed over all distances in the network (Kleinberg)

"only and only if"? :)

Yes! There have to be *very* careful (aka "clever") when constructing links
between nodes in SWNs. See Jon Kleinberg's works for details (google jon 
kleinberg)

"only and only if" is still not a meaningful English expression, AFAIK ;-)

-Example systems: SWAN, Freenet

Freenet? Freenet definitely does not have the properties of storing mappings on your machine-- in Freenet, *everything* is distributed through the network.


Umm..perhaps I should be more specific. Basically I have categorized these
systems based on the routing mechanism. I should, however, update the section
names for better clearance.

No, you should make clear that the characterizations you list do not apply to all examples you give.

                Insert/Delete   Space

        Search

Pastry:         O(log^2 n)      O(nlog n)       O(log n)
Tapestry:       O(log^2 n)      O(nlog n)       O(log n)

In Pastry and Tapestry, each node has to know *more neighbours than there are nodes in the system* (n * log n)? Ahem.


Ok, in this we have to be careful :). The summary table is far from complete
(from "truth"). However, I copied part of this table from the article named
"Distributed Object Location in a Dynamic Network",

http://oceanstore.cs.berkeley.edu/publications/papers/pdf/SPAA02.pdf.

I have a strong suspicion that they mean space required in the *whole* network-- O(log n) neighbours per node = O(n log n) neighbours in the whole system. The O(n log n) per node which you gave doesn't make sense, anyway... They also give space for CAN as O(nr), where r is the number of dimensions-- neighbours in CAN per node are O(r), AFAIK (O(1) if you consider the number of dimensions to be fixed-- which makes sense, because it's a parameter picked at implementation time, not modified when the network grows). And so on.

2.1.1 Facts
-specific Storm block can be identified with an ID, generated by SHA-1
algorithm
-block ID is globally *unique*, e.g. "Front page of New York Times
newspaper on 10.10.2002"

But note that's it's a *number*, which you cannot immediately see has anything to do with the New York Times, newspapers, or 10.10.2002.

Yes, *random* number.

Random-looking. It is a deterministic function of the data in the block, so it isn't really random.

-approach specific pseudo thinking in SWAN: "based on block's ID, go to the
node, whose (euclidean) distance is closest to block's ID in identity space"

Here the simplification bites. There are two ways to interpret this 'pseudo thinking':

1. This is used as a DHT routing algorithm: that node will tell us which other node has the block (i.e., we 'map our resources into the network' as you described it above) 2. The node we found is the node to download the block from. This means that a computer would have to run a node for every block it stores (making number of neighbours to store an extremely unattractive O(k) with k = number of blocks on the local computer).

My problem: If it's 1), this is just another type of DHT. If it's 2), it's about as attractive as flooding broadcast networks, because it simply doesn't scale.

I think the first approach is more close ("somekind of DHT")...

Sorry if this sounds harsh :-( but your work is of no use to me as long as you don't clearly explain the algorithms you're talking about. I said that I see only two interpretations of what you said: DHT or one-node-per-block. I see no others, so if you say one of these is "close," but not quite there, I haven't been helped at all. Please, either explain how this works, or say that you can't and why.

(In case you don't really know how it works: That's not a problem. But then, please *say* so instead of making asserions about this sysatem and its properties...)

Sorry, but this is really important for your work to be useful to us others in the group...

...however, in SWNs, the most important thing how the structure (links between
nodes) is constructed; so that certain rules are are fulfilled (again, see
Kleingberg).

This doesn't tell me how SWAN works, and how that's different from a DHT. I have a rough understanding of how it links nodes; I do not understand how blocks are published and looked up. I see that small-world networks could provide an interesting DHT routing algorithm; you're saying SWAN is doing something different, but not *what*. :(

c) FBN
-In this case, there is no benefit from the block's ID at all

What do you mean by this?

Hm..that there is not "hints" in the network where block is located in the
network. On the other hand, in DHTs (and in somehow in SWNs also) based on block
ID's hash, nodes gives "hints" (routes more close to ID in the key space)
constantly to a query lookup until the specific node is found which hosts the 
block.

Ok.

-finding key blocks for data block can be performed locally (in logarithmic
time)

- "...can be performed locally"? It's like loading blocks: If you don't have the data, you must request it from the network.
- Why logarithmic time? Locally, we can use a hashtable -> O(1)?

Also, look above :). Ok, to be more specific: worste case is locarithmic time,
usually constant time.

Sorry, I understand neither of these hints. Could you reply to my questions directly-- what do you mean by "can be performed locally," and what do you mean by 'logarithmic time'? (If you're giving 'logarithmic time' as the worst-case performance of hashtables, I think that should be 'linear time'?)

-simple pseudo thinking: "find all Storm blocks from the network,

> which uses specific urn-5 name. Compare blocks and

return the most recent block, if the signing key is "valid"."

Well... you would probably only request the most current from the network and check its signature, since if the network is trying to fool you into believing an old version is the current one, it could simply hide the blocks containing the newer versions from you in the first place.

How do we *know* to request the most current version from the distributed
network (key ?) ?

Ok, right: the network has to do some cooperation on this one-- in a flat DHT, we would have to request all blocks. To be faster, we would need to be able to do some computation on the node that stores the mappings, to determine the newest block, and then return only that-- GISP is planned to support XML queries over mapping values in the future, IIRC...

Hm, here's another idea: As the mappings, store
(uri -> [blockid, date]). I.e., the value is a record containing
a block id and the timestamp of that block. Then, we could request all *mappings* and select the newest one, without actually downloading the *blocks*. (We would then of course check that the block we select does actually have that date and urn-5 and is signed...) This works on top of a plain DHT.

I may have understood this all wrong, but if we want all blocks associated with
a specific urn-5, we have to check all blocks (since from this point of view,
urn-5 and block IDs are independent from each other: urn-5 names has no
information about the blocks which may associated with the specific urn -5 ??) ?

Ok, that's right-- we have to scan each block and put it into the index.

2.2.4. Open questions:
-in this case, is it sensible to maintain two different key-value mappings

key-value based systems (DHT, SWN) ?

        -one fore block IDs
        -one for urn-5 names, which are associated with block IDs
        -is this approach too difficult to maintain ?

Why would one want to? I don't see any benefits.

This is based on my preasumption below.

Which one?

> If we know all the block's associated
with a given urn-5, we can find the block(s) more efficiently.

Well, duh. But your question was (or I understood it as) whether it makes sense to maintain *two different* key-value based systems for block IDs and urn-5 names. I don't see why you would want to-- you can store both mappings in a single DHT (which is probably more efficient than having two parallel DHTs, and also easier to maintain).

-is there possibility, in the specific urn-5 name, to maintain
information
about most recent block's ID for better search performance (or moreover,
tree based structure for all blocks for specific urn-5 name) ?

No. You create the urn-5 *before* you create any of the blocks, so you cannot make it depend on the blocks (that's why we can't use cryptographical hashes).

Heh...here's the answer for my two answers above :).

What do you mean?

Sorry for having such a hard time understanding you :-(

- Benja




reply via email to

[Prev in Thread] Current Thread [Next in Thread]