swarm-modeling
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ||ism


From: Lindsay Hood
Subject: Re: ||ism
Date: Fri, 9 May 1997 08:51:00 +1000

For what it's worth, I spent 3 years working at Thinking Machines.  Some of
these issues caused me, customers and the company much angst.  My $0.02
(Australian) worth :

> 1.  I think we need a || methodology that will support *both*
> distributed computing, where synchronization may be seldom required,
> and || computing, where synchronization may occur very often.  This
> seems to imply to me that message passing will become a serious
> hindrance in the latter.  And that makes me think that we might not
> want to dive glibly into using a system based on MPI.

MPI does indeed support both distributed and || computing.  Although I think
MPI is too big, and they shied away from Active Messages, I suspect it is the
only portable standard out there.  The alternative for Swarm is to use a CORBA
framework for remote process invocation.  This will work well for distrib apps,
but not for tightly coupled.

>
> 2.  It's possible that we could use some hybrid of MPI and a virtual
> shared memory system.  That might make things a bit easier, since
> we could rely on the shared memory for all the kernel data trading
> (which would include the synchronization of Swarms and schedules),
> and still use MPI for other object-to-object communication across
> hardware.

How MPI is implemented on a machine should be of no concern to the app on top.
 I believe MPI2 has shared memory put and get (Active Messages if you are a CM5
or Berkeley head or shmemput/shmemget if you are a Cray T3{D|E}er)  Public
domain MPI implementations for SMPs perform quite well.

>
> 3.  From the modelling perspective, do any of you *know* for certain
> how much synchronization your apps will require?  I.e. do we have
> an estimate of how badly the highly synchronous apps will hit us
> and how often we're likely to see apps like that?

It will vary tremendously.  Highly synchronous apps will hurt unless you have
very good communication hardware.  This is where the T3E is way ahead - 450MB/s
between processors with really low latency.  And for highly synchronous apps it
is the latency that tends to dominate.  As in all parallel computing, the less
communication the better.

>
> 4.  Does anybody out there have any experience with pure
> message passing versus distributed shared memory methods?  I.e.
> am I fooling myself into thinking that the shared memory method
> will be any more efficient than a pure message passing interface?
>

It depends on how the distributed shared memory looks to the programmer.
The Connection Machines were data parallel, and shared memory looked like a
uniform data structure that just happened to reside on distributed memories.
 In the early days, you could not even tell which processor a particular
element was assigned to.  It required a different way of thinking about
programming, but once you got your head around it, it was very elegant, and for
many applications ended up being much "closer" to the maths than other
languages.
Using active messages, shmem{put|get}, etc, the distributed nature of memory is
mostly washed under the carpet.
With true distributed shared memory, a la SGI Origin, CRAFT on the T3E, one can
do all sorts of address arithmetic and find the machine is communicating itself
to death.
In my opinion, shared memory is much easier to program than message passing for
real applications.  Once you start using asynch or one-way messages then the
possibilities of deadlock are numerous.  And debuggers can be of little help
due to non-deterministic race conditions.  Shared memory programming can get
tough with critical regions and locks/semaphores, etc, but at least that is a
well understood problem, and there are many OS books which cover it.  DSM
performance relies on very low latency communication, something that will not
be the case for typical distributed applications.
My personal belief is that the hardware bandwidth and latency that machines
like the Cray T3E and SGI Origin are showing will make DSM the programming
paradigm of choice for most programmers
Message passing sounds easy until you actually try it for complicated
applications.  Heroic programming can lead to spectacular results, but it's
tough.
But I can think of numerous highly qualified computational scientists who will
disagree with me on any issue here.

> *Any* opinions on where Swarm should go with this are wanted.

Is putting Posix threads into the Swarm kernel feasible?
I think Swarm should focus on moderate ||ism on SMPs, rather than highly
parallel MPPs.

Lindsay



-- 
Lindsay Hood
Senior Research Scientist, High Performance Computing
National Resource Information Centre, Bureau of Resource Sciences

address@hidden

Subvert the dominant paradigm



                  ==================================
   Swarm-Modelling is for discussion of Simulation and Modelling techniques
   esp. using Swarm.  For list administration needs (esp. [un]subscribing),
   please send a message to <address@hidden> with "help" in the
   body of the message.
                  ==================================


reply via email to

[Prev in Thread] Current Thread [Next in Thread]