swarm-modeling
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Swarm-Modelling] SWARM on Clusters


From: john.sauter
Subject: Re: [Swarm-Modelling] SWARM on Clusters
Date: Thu, 29 Jan 2004 15:34:09 -0500

All the discussion on running Swarm on multiple processors seems to miss
the point of why this is really hard: time synchronization across multiple
processors. Multiple swarms and hierarchical event lists don't help you at
all if you are running a truly asynchronous system. When it comes down to
executing, Swarm has to go through all the events and figure out which one
runs next. Unless you have lots of things happening on a regular schedule
(e.g. every 10 clock tics all 10,000 agents in your system run through some
update routine) most of the time there will be only one event at a certain
time to execute (i.e. not much parallelism there). Only when there are
multiple events at the same time do you have an opportunity to run those
multiple events on separate machines. As others have noted the overhead of
shipping the code and data to another machine to execute a single event can
be large compared to the processing time for that event, so unless you are
running on a shared memory architecture (where you don't have to move the
code and the data) this approach is not very tenable for many simulations.

There are other ways to distribute the event list and the entities in your
model over a MPP and run a truly distributed simulation. You can play
tricks with different time granularities and hierarchies of swarms to
maximize the number of events you could run in parallel, but If you try to
distribute Swarm over a cluster or MPP you are in for some nasty surprises
when it comes to keeping your multiple event lists synchronized. It can be
done (since we did it - http://www.erim.org/swarm/), but its not easy. We
did find that the messaging overhead is significant with RMI. If you have
few entities with a lot of computation, that model might work. It can also
work when you have a lot of entities executing some code on a regular
schedule (as described above). But if you have lots of computationally
lightweight objects that do a lot of interaction and behave more or less
asynchronously, then your simulation may actually run slower on multiple
machines. The RMI overhead is really a crime since it is possible to have
fairly lightweight network communications, but most of the protocols we
have easily available today are not meant to support the kind of
communication we would like to have in highly distributed simulations. BTW
clusters don't help much. They are designed to distribute the threads of a
multi-threaded application. For all intents and purposes, Swarm is a single
threaded application (when it comes to executing the model). So a cluster
doesn't help at all.


John A. Sauter
Group Leader, Emerging Markets
ph: 734.302.4682 fax: 734.302.4991
Enterprise Solutions Division
Innovative Solutions in Program Management, Analysis and Logistics
Altarum (http://www.altarum.org)
Street: 3520 Green Ct, Ann Arbor, MI 48105
Mail: PO Box 134001, Ann Arbor, MI 48113-4001


                                                                                
                                                      
                      "Marcus G.                                                
                                                      
                      Daniels"                 To:       address@hidden         
                                                 
                      <address@hidden>        cc:                               
                                                     
                      Sent by:                 Fax to:                          
                                                      
                      address@hidden        Subject:  Re: [Swarm-Modelling] 
SWARM on Clusters                                      
                      warm.org                                                  
                                                      
                                                                                
                                                      
                                                                                
                                                      
                      01/29/2004 12:44                                          
                                                      
                      PM                                                        
                                                      
                      Please respond to                                         
                                                      
                      modelling                                                 
                                                      
                                                                                
                                                      
                                                                                
                                                      




Sunwoo Park wrote:

>I just joined in this mailing list.
>I have a simple question regarding SWARM software.
>Is there any SWARM implementation that runs on cluster machines (or MPP
>machines) based on Message Passing Paradigm (e.g., MPI) ?
>
>
Swarm has a fine-grained knowledge of concurrency during a simulation.
When multiple agents do something at the same timestep, Swarm knows
this.   But that's just a little atom of the whole simulation execution
sequence.  What this means is that in order for Swarm to exploit this
knowledge on a parallel computer, it is necessary to be able to
efficiently get that atom of computation to a physical processor.  A
cluster, like a Beowulf arrangement of PCs, can't do this because the
communication expense of getting the atom to the processor not amortized
by the computation done.   A SMP or NUMA system can do this because the
communication/overhead expense of getting the computation to the
processor is small.  So if you have a two or four or eight way Opteron
or Sun system or a big NUMA system like a SGI Altix, the interconnect
between processors could reasonably slurp up these atoms and there would
be a scalability win.

I think it would be hard to make a message passing system scale very
well based on an architecture like Swarm.   You'd need low-latency
interconnect, maybe Myrinet.

In any case, Swarm doesn't implement either.  A multithreaded Swarm
would be feasible, but would assume a shared memory system like I
mentioned.
_______________________________________________
Modelling mailing list
address@hidden
http://www.swarm.org/mailman/listinfo/modelling







reply via email to

[Prev in Thread] Current Thread [Next in Thread]