[Gridpt-discuss] work history

gridpt-discuss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gridpt-discuss] work history

From:	Pedro Andrade
Subject:	[Gridpt-discuss] work history
Date:	Fri, 25 Jul 2003 17:51:45 +0200
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624

Hi

Just to keep the new members up to date.

With this mail we pretend to expose what we have been doing, whatdecisions we have taken and in what directions we are thinking to moveto. This doesn’t pretend to be a complete explanation of the work donebut only to clarify the progress of our ideas and the current situation.

Our first proposal(http://fisica.fe.up.pt/cgi-bin/twiki/view/Gridpt/FirstProposal) was tomake a small centralized Grid system for job submission and datamanagement. In this architecture, all the information about resourcesand jobs is centralized in a central element which receives requestsfrom users and forward them to worker nodes. Those worker nodes havepreviously declared to the central element that they are ready toreceive jobs. The central server would store no information regardingthe status of the worker nodes... the worker nodes whenever they feltthat they "can help" the system by executing a job, would ask thecentral server if there's something on queue that they can execute. Themain idea of this prototype was to have a centralized grid architecturewith a pull job model, based also on the principle of data partitioning(the system works better if data is split through all the worker nodes).Jobs run where the data already is; there is no data movement - or atleast the data movement is reduced to a minimum.

Then, we started to realize that this sort of approach would only leadto a small EDG-architecture-style with all the problems of a centralizedsystem (single point of failure - if server goes down, everything goes).For this reason we started analyzing some P2P systems - which are knownfor their robustness against failure. From this analysis we came to asecond proposal(http://fisica.fe.up.pt/cgi-bin/twiki/view/Gridpt/SecondProposal). Inthis proposal we defined a simple and efficient distribution of highlyparallelizable services using a peer-to-peer based approach (perhapsusing web services for communication). The main idea was still topreserve the “no data movement” principle, but instead of a centralizedpull system we should now have a decentralized push system where somesemi-central nodes (aggregators) receive the requests from the user,query all the nodes, aggregate the answers and retrieve them to theuser. With this approach there will be no more a“single-point-of-failure” (no EDG resource broker) since it’s up to thepeer to decide if he can run the job or not. The aggregator node onceagain knows nothing about the worker nodes, except that they exist -there's no central repository containing a "global" status of the system.

From this generic approach we then started to try and clarify how thissystem could be implemented. It's main characteristics should be:

- decentralized system (peer-to-peer)
- local information catalog (each peer just knows what it has)

- abstract information types (each peer wouldn't have a catalog specificfor data management and another catalog specific to information index -they would have an abstract information storage system capable ofdealing with all these - metadata)

- modular architecture

- a first implementation of this system should be a data managementdemonstration

After some analysis we can upon the following two major aspects, whichwe started working on to try and find the best solution:


1) P2P system
There are several types of peer-to-peer systems:
- centralized approach (naspter style)
- semi-centralized approach (kazaa style)
- decentralized approach (gnutella)

All of these have their advantages and disadvantages. We think that theone that can bring better results is the semi-centralized approachbecause it doesn’t have a single-point-of-failure of the centralizedsystems and also doesn’t floods the entire network with queries/requestlike the decentralized systems do. The semi-centralized systems use theidea of super nodes where peers are joint in groups having one supernode and organized in a hierarchical way.Besides this network structure issue, there is another important problem- the discovery mechanism. How should peers discover others peers?Should even discover them or not be aware of other peers? What should bethe function of the super node/aggregator? In Twiki(http://fisica.fe.up.pt/cgi-bin/twiki/view/Gridpt/NodeRendezvous) werefer to how could the peers register and operate in the network. Onegood solution is to use Kademlia (a protocol we have studied that useshash keys to find nodes that it doesn't know of).


2) Information layer

Concerning the information/data storage, searching and retrieving twodifferent options are being analysed.- Using RDF(http://fisica.fe.up.pt/cgi-bin/twiki/view/Gridpt/RdfSchema). Using rdfwill allow for a more standard and generic implementation butquerying/retrieving different parts of metadata will be more complicated.- Using local databases(http://fisica.fe.up.pt/cgi-bin/twiki/view/Gridpt/LocalDB). The usage oflocal database will allow a more powerful control over the query systembut it will bring the disadvantage of having a more rigid schema for themetadata.We are now trying to find a solution which could join the best partsfrom these two alternatives.

---

So, to sum up, we'd like to build a peer to peer network, with ametadata layer on top. The exact mechanism for the p2p system are beingstudied (although we have some pretty definite ideas about this). Themetadata layer is more troubling, because it can actually influence theway that peers work together.

As usual, we appreciate any comments or questions. It's actually quitedifficult to try and explain a system with so many variants andpossibilities in such a "short" email.


Best Regards,
Miguel Branco
Pedro Andrade

[Prev in Thread]

Current Thread

[Next in Thread]

[Gridpt-discuss] work history, Pedro Andrade <=

Prev by Date: [Gridpt-discuss] today's meeting
Previous by thread: [Gridpt-discuss] today's meeting
Index(es):
- Date
- Thread