Re: [Gzz] hemppah's research problems document

gzz-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gzz] hemppah's research problems document

From:	B. Fallenstein
Subject:	Re: [Gzz] hemppah's research problems document
Date:	Sat, 14 Dec 2002 21:08:22 +0100
User-agent:	Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.2) Gecko/20021204 Debian/1.2.1-1


Hi Hermanni,

we seem to be miscommunicating again and again... sorry. *sigh* Please,be as explicit as possible when answering, to avoid misunderstandings...:-/


address@hidden wrote:

-DHTs require O(log n) hops to reach arbitrary destinations, assuming that
each node maintains
information about O(log n) nodes
This is not true. *Some* DHT routing algorithms require O(log n) nodesto be stored, but CAN or a small-world assumption-based routingalgorithm only require O(1) neighbours (where CAN routing takes squareroot time). You say that yourself, below.
For CAN, though, for only *oneÄ dimension is a "special" case.  For CAN, when
there is a only one dimesion, CAN performance is far for other DHTs.


I don't understand these two sentences...

Indeed,
authors suggest that setting as d = log n, CAN has O(log n) neighbors and O(log
n) pathlengths like other DHT algorithms.

So? That still means that *some* DHT algorithms perform as you said,*but not all* (CAN with say, 3 dimensions).

-SWNs require O(log^2 n) hops to reach arbitrary destinations, assuming
(*only and only if* !!!) thatlinks between nodes are constructed in the way that they are uniformlydistributed over all distancesin the network (Kleinberg)
"only and only if"? :)
Yes! There have to be *very* careful (aka "clever") when constructing links
between nodes in SWNs. See Jon Kleinberg's works for details (google jon 
kleinberg)


"only and only if" is still not a meaningful English expression, AFAIK ;-)

-Example systems: SWAN, Freenet
Freenet? Freenet definitely does not have the properties of storingmappings on your machine-- in Freenet, *everything* is distributedthrough the network.
Umm..perhaps I should be more specific. Basically I have categorized these
systems based on the routing mechanism. I should, however, update the section
names for better clearance.

No, you should make clear that the characterizations you list do notapply to all examples you give.

                Insert/Delete   Space


        Search

Pastry:         O(log^2 n)      O(nlog n)       O(log n)
Tapestry:       O(log^2 n)      O(nlog n)       O(log n)
In Pastry and Tapestry, each node has to know *more neighbours thanthere are nodes in the system* (n * log n)? Ahem.



Ok, in this we have to be careful :). The summary table is far from complete
(from "truth"). However, I copied part of this table from the article named
"Distributed Object Location in a Dynamic Network",

http://oceanstore.cs.berkeley.edu/publications/papers/pdf/SPAA02.pdf.

I have a strong suspicion that they mean space required in the *whole*network-- O(log n) neighbours per node = O(n log n) neighbours in thewhole system. The O(n log n) per node which you gave doesn't make sense,anyway... They also give space for CAN as O(nr), where r is the numberof dimensions-- neighbours in CAN per node are O(r), AFAIK (O(1) if youconsider the number of dimensions to be fixed-- which makes sense,because it's a parameter picked at implementation time, not modifiedwhen the network grows). And so on.

2.1.1 Facts
-specific Storm block can be identified with an ID, generated by SHA-1
algorithm
-block ID is globally *unique*, e.g. "Front page of New York Times
newspaper on 10.10.2002"
But note that's it's a *number*, which you cannot immediately see hasanything to do with the New York Times, newspapers, or 10.10.2002.
Yes, *random* number.

Random-looking. It is a deterministic function of the data in the block,so it isn't really random.

-approach specific pseudo thinking in SWAN: "based on block's ID, go to the
node, whose (euclidean) distance is closestto block's ID in identity space"
Here the simplification bites. There are two ways to interpret this'pseudo thinking':
1. This is used as a DHT routing algorithm: that node will tell us whichother node has the block (i.e., we 'map our resources into the network'as you described it above)2. The node we found is the node to download the block from. This meansthat a computer would have to run a node for every block it stores(making number of neighbours to store an extremely unattractive O(k)with k = number of blocks on the local computer).
My problem: If it's 1), this is just another type of DHT. If it's 2),it's about as attractive as flooding broadcast networks, because itsimply doesn't scale.
I think the first approach is more close ("somekind of DHT")...

Sorry if this sounds harsh :-( but your work is of no use to me as longas you don't clearly explain the algorithms you're talking about. I saidthat I see only two interpretations of what you said: DHT orone-node-per-block. I see no others, so if you say one of these is"close," but not quite there, I haven't been helped at all. Please,either explain how this works, or say that you can't and why.

(In case you don't really know how it works: That's not a problem. Butthen, please *say* so instead of making asserions about this sysatem andits properties...)

Sorry, but this is really important for your work to be useful to usothers in the group...

...however, in SWNs, the most important thing how the structure (links between
nodes) is constructed; so that certain rules are are fulfilled (again, see
Kleingberg).

This doesn't tell me how SWAN works, and how that's different from aDHT. I have a rough understanding of how it links nodes; I do notunderstand how blocks are published and looked up. I see thatsmall-world networks could provide an interesting DHT routing algorithm;you're saying SWAN is doing something different, but not *what*. :(

c) FBN
-In this case, there is no benefit from the block's ID at all
What do you mean by this?


Hm..that there is not "hints" in the network where block is located in the
network. On the other hand, in DHTs (and in somehow in SWNs also) based on block
ID's hash, nodes gives "hints" (routes more close to ID in the key space)
constantly to a query lookup until the specific node is found which hosts the 
block.

Ok.

-finding key blocks for data block can be performed locally (in logarithmic
time)
- "...can be performed locally"? It's like loading blocks: If you don'thave the data, you must request it from the network.
- Why logarithmic time? Locally, we can use a hashtable -> O(1)?
Also, look above :). Ok, to be more specific: worste case is locarithmic time,
usually constant time.

Sorry, I understand neither of these hints. Could you reply to myquestions directly-- what do you mean by "can be performed locally," andwhat do you mean by 'logarithmic time'? (If you're giving 'logarithmictime' as the worst-case performance of hashtables, I think that shouldbe 'linear time'?)

-simple pseudo thinking: "find all Storm blocks from the network,
> which uses specific urn-5 name. Compare blocks and
return the most recent block, if the signing key is "valid"."
Well... you would probably only request the most current from thenetwork and check its signature, since if the network is trying to foolyou into believing an old version is the current one, it could simplyhide the blocks containing the newer versions from you in the first place.
How do we *know* to request the most current version from the distributed
network (key ?) ?

Ok, right: the network has to do some cooperation on this one-- in aflat DHT, we would have to request all blocks. To be faster, we wouldneed to be able to do some computation on the node that stores themappings, to determine the newest block, and then return only that--GISP is planned to support XML queries over mapping values in thefuture, IIRC...


Hm, here's another idea: As the mappings, store
(uri -> [blockid, date]). I.e., the value is a record containing

a block id and the timestamp of that block. Then, we could request all*mappings* and select the newest one, without actually downloading the*blocks*. (We would then of course check that the block we select doesactually have that date and urn-5 and is signed...) This works on top ofa plain DHT.

I may have understood this all wrong, but if we want all blocks associated with
a specific urn-5, we have to check all blocks (since from this point of view,
urn-5 and block IDs are independent from each other: urn-5 names has no
information about the blocks which may associated with the specific urn -5 ??) ?


Ok, that's right-- we have to scan each block and put it into the index.

2.2.4. Open questions:
-in this case, is it sensible to maintain two different key-value mappings


key-value based systems (DHT, SWN) ?

        -one fore block IDs
        -one for urn-5 names, which are associated with block IDs
        -is this approach too difficult to maintain ?


Why would one want to? I don't see any benefits.


This is based on my preasumption below.


Which one?

> If we know all the block's associated

with a given urn-5, we can find the block(s) more efficiently.

Well, duh. But your question was (or I understood it as) whether itmakes sense to maintain *two different* key-value based systems forblock IDs and urn-5 names. I don't see why you would want to-- you canstore both mappings in a single DHT (which is probably more efficientthan having two parallel DHTs, and also easier to maintain).

-is there possibility, in the specific urn-5 name, to maintain
information
about most recent block's ID for better search performance (or moreover,
tree based structure for all blocks for specific urn-5 name) ?
No. You create the urn-5 *before* you create any of the blocks, so youcannot make it depend on the blocks (that's why we can't usecryptographical hashes).
Heh...here's the answer for my two answers above :).


What do you mean?

Sorry for having such a hard time understanding you :-(

- Benja

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Gzz] hemppah's research problems document, (continued)

Prev by Date: Re: [Gzz] hemppah's research problems document
Next by Date: Re: [Gzz] hemppah's research problems document
Previous by thread: Re: [Gzz] hemppah's research problems document
Next by thread: Re: [Gzz] hemppah's research problems document
Index(es):
- Date
- Thread