[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gomp-discuss] Somethings to think about ....
From: |
Lars Segerlund |
Subject: |
Re: [Gomp-discuss] Somethings to think about .... |
Date: |
Mon, 10 Mar 2003 15:53:24 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.2.1) Gecko/20021226 Debian/1.2.1-9 |
Steven Bosscher wrote:
Hi Lars, all,
Op ma 10-03-2003, om 13:44 schreef Lars Segerlund:
I have also been looking at the linux support for MP, and NUMA ( which
was added lately ), and linux does support affinity and NUMA in the
latest kernels, however to take maximum advantage of this it would be
quite resonable to do a native port to linux ( using clone instead of
any thread library ), and only implement the synchronization element's
needed by openMP.
I think our goal should be to make GCC concurrency-aware and use it for
OpenMP with threads as a first application. Everything else
(autoparallelization, NUMA-awareness, grid computing, the construction
of HAL, what else?) is beyond the scope of this project.
If you want to make stuff linux-specific, you'd have to stuff it down
the throats of the GCC community with force and violence to make them
accept your contribution and like it.
I didn't want to make stuff linux specific, I wanted to keep the
possibility open of adding machine specific implementations later for
efficiency, I do realize that it might hace sounded 'bad'.
Besides, load balancing, scheduler affinity and other low-level SMP/NUMA
stuff is the kind of stuff that a kernels is responsible for. OpenMP is
not designed for such purpose.
(You can distribute tasks over clusters of CPUs with HPF2 (DISTRIBUTE,
BLOCK, etc.), that gives you some control of how your job will run on a
NUMA machine. But it's not very portable and as a developer you need to
know all the ins and outs of the machine you're targeting. I suppose
this explains why I've seen only a few HPF applications that use this
feature...)
Still the first thing to do is to get openMP running with a threading
library, and perhaps ( if smp safe ) a semaphore library.
As for the tasks ahead, I think it's not to hard to use the framework
in the paper to target the GENERIC tree's ( which is the most resonable
form to target IMHO ). The algoritms for an rather good implementation
seem's to all be there, and the nice part is that if we extend the
pragma handling and add a -fgomp to gcc, we should be able to leave most
of the regulat stuff in place.
I would prefer -fopenmp :-)
Why do you come up with all the good names :-) ... I do agree.
I do however have a question, I know gcc does support barriers, but to
what extent and in what context ?
What do you mean with "GCC supports barriers"?
The only barriers I know of in GCC are BARRIER insns in RTL. In that
context, a BARRIER basically is just the marker of the end of a code
block (e.g. after an unconditional jump_insn). In other words, it
states: "Control flow ends before this". It is used for code alignment
(i.e. the insn following the BARRIER can be aligned) among other things.
This has nothing to do with barriers they talk about in the OpenMP
specs; that one synchronizes threads.
This I am aware of, however I was not aware of what they were used for
in gcc thus the question about them. Also I was not aware that they were
on the RTL level, ( which makes them uninteresting for us ).
As far as I understand it it supports
barriers which prevents sections of code to be handled together, ( tus
enforcing separate optimization ). I'm still looking, but does anybody
know if this is correct ?
Well, barrier really are just markers for places where there is one and
only one out-edge in the control flow graph. That does not necessarily
imply that all the optimizers stop there.
For example, after expanding trees to RTL, you'll see that the dump file
is littered with BARRIERs all over, but after some basic flow graph
optimizations (jump!), most of them are gone. And if that single edge
before the barrier is a back-edge, the loop optimizers use them to
identify loops. And crossjumping is all-barriers? In these case,
barriers _allow_ the compiler to identify optimization opportunities!
Thanks, I have looked at this a bit now, and I don't claim to
understand it fully, but I see what they do now. I'm just desperately
looking for a mechanism in place to restrict optimizer scope, but I
figure we do have to make them parallell aware insted.
I thought that we might as well start documenting what we want to do
with gcc, the trees and what we have to modify.
Do you have a plan we can discuss on some mailing list?
Greetz
Steven
As for a plan, do you mean something concrete ?
I would then suggest that we investigated what is needed to enhance
GENERIC enough to support the form of Diego's paper, since there is a
set of algorithms to support this.
I was more thinking that we could have a discussion about what the
plan should be :-) ... since we don't have a plan yet.
Basicly I think along these lines,
1. the lib is trivial to do, and a stub might as well be enough to
enable other areas of work to progress.
2. the tree modifications are not that hard, but have to be carefully
planned in order to be efficient and extendable. Still they have to be done.
3. the algorithms used for the concurrency can be tested on the
tree's when these are done, without a proper front and backend, this
might even be a very nice thing to do in order to get some proper
testing done. If this phase is basicly 'bug' free I think a lot of later
work is spared.
4. at this point it should be about time to figure how to interface
with gcc n the most 'non intrusive' manner. And as I understand it,
there could be two routes to this, the first is to make gcc ignore (
remove ) the parallell part's of the tree if not -fopenmp is given, and
the second is to enable the extra 'concurrency aware' code if the
-fopenmp ( or replace parts of gcc with concurrency aware code ). ( I
don't know if it get's through what I mean, but it's basicly a tightly
knit implementation vs. a loosely knit implementation ).
5. When this is done, it would be reasonable to start doing the code
generation. ( I haven't given this any thoughts ).
6. Front end work making gcc take advantage of the parallell trees
should be the last thing to get the compiler working, and at this point
we should have a working implementation.
So I should think we would need a specification of what to do with the
trees and what we need to represent, from there we only have to code a
lot. ( :-D ).
/ Lars Segerlund.
- [Gomp-discuss] Somethings to think about ...., Lars Segerlund, 2003/03/10
- Re: [Gomp-discuss] Somethings to think about ...., Steven Bosscher, 2003/03/10
- Re: [Gomp-discuss] Somethings to think about ....,
Lars Segerlund <=
- Re: [Gomp-discuss] Somethings to think about ...., Steven Bosscher, 2003/03/10
- Re: [Gomp-discuss] Somethings to think about ...., Pop Sébastian, 2003/03/10
- Re: [Gomp-discuss] Somethings to think about ...., Steven Bosscher, 2003/03/10
- Re: [Gomp-discuss] Somethings to think about ...., Pop Sébastian, 2003/03/10
- Re: [Gomp-discuss] Somethings to think about ...., Steven Bosscher, 2003/03/10
- Re: [Gomp-discuss] Somethings to think about ...., Diego Novillo, 2003/03/10
- Re: [Gomp-discuss] Somethings to think about ...., Steven Bosscher, 2003/03/10
- Re: [Gomp-discuss] Somethings to think about ...., Pop Sébastian, 2003/03/11
- Re: [Gomp-discuss] Somethings to think about ...., Lars Segerlund, 2003/03/11
- Re: [Gomp-discuss] Somethings to think about ...., Steven Bosscher, 2003/03/11