OpenMP+CLANG RTL && Polly && Future Dev

All, first, kudos to everyone that have worked ever so hard to get this project off the ground. Native OpenMP support in CLANG w/ a properly licensed RTL is wonderful. A few quick questions regarding the ongoing development and testing of the aforementioned code bases.

- Has there been any testing associated with Polly's automatic parallelization backend and CLANG's native OpenMP layer + Intel RTL? [really Intel's RTL]
- I attended the OpenMP BOF at SC13 and noted the work on supporting an ARM port of the RTL. Will there be a formal procedure in the future to support easier porting+integration+testing of new architecture targets [or updated ports for existing targets]?
- Has there been any thought given to decoupling the RTL backend threading/tasking layer from the language + openmp compat layers? EG, it might be advantageous in the future to support threading/tasking layers such as Qthreads. This especially makes sense on targets with higher degrees of hardware parallelism such as Intel KNC, Convey MX, et al.
- Do we have a running list of development 'Todo' items yet? [CLANG layer and RTL layer] I saw some notional description of various things that could be done at the SC13 BOF. I wasn't sure if there were owners for the individual items yet or people interested in implementing the changes.

Again, thanks for all the hard work so far. Very pleased to see this come to fruition. I'm very much looking forward to participating in the community development moving forward.

cheers
john

John D. Leidel
Software Compiler Development Manager
Micron Technology, Inc.
jleidel@micron.com
office: 972-521-5271
cell: 214-578-8510

- Has there been any testing associated with Polly's automatic parallelization backend and
  CLANG's native OpenMP layer + Intel RTL? [really Intel's RTL]

I'm not completely sure what the question is here. Are you asking
1) Can Polly auto-parallelization be used in conjunction with explicit OpenMP code?
or
2) Is there automatic testing going on that includes Polly with OpenMP?

Though, maybe it doesn't matter, since AFAIK the answer to either question is "No work has been done on this".

- I attended the OpenMP BOF at SC13 and noted the work on supporting an ARM port of the RTL.
  Will there be a formal procedure in the future to support easier porting+integration+testing
  of new architecture targets [or updated ports for existing targets]?

The procedure at the moment is the same as for any change.
Give us patches via openmprtl.org and we'll integrate them and push them out.

The whole issue of testing is a big hole at the moment :frowning:
We'd be very happy to integrate an OpenMP test suite if someone has one to contribute.
(We can't release the one Intel uses internally because it contains regression tests that have
extracts from customer codes which we cannot make public).

- Has there been any thought given to decoupling the RTL backend threading/tasking
layer from the language + openmp compat layers? EG, it might be advantageous in the
future to support threading/tasking layers such as Qthreads. This especially makes
sense on targets with higher degrees of hardware parallelism such as Intel KNC, Convey MX, et al.

From the description of QThreads it doesn't seem immediately appropriate as a substrate for OpenMP.

OpenMP really wants to control the whole machine and have an OpenMP thread per logical CPU, whereas
QThreads says "The qthreads library on an SMP (i.e. the POSIX implementation) is essentially a library
for spawning and controlling coroutines: threads with small (4k) stacks."
OpenMP codes expect to have large stacks (good OpenMP parallelization is normally as high up the call tree
as can be achieved), and threads are expected to be persistent (see the rules about thread-local
storage in the standard). (This aside from QThreads having no implementation on Windows :-)).

- Do we have a running list of development 'Todo' items yet? [CLANG layer and RTL layer]
I saw some notional description of various things that could be done at the SC13 BOF.
I wasn't sure if there were owners for the individual items yet or people interested in implementing the changes.

There's no explicit list. A number of groups are doing different things (such as the Rice work on OMPT), but no grand
plan exists.

-- Jim

James Cownie <james.h.cownie@intel.com>
SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
Tel: +44 117 9071438

Jim, thanks for the response. I've made inline comments below.

John D. Leidel
Software Compiler Development Manager
Micron Technology, Inc.
jleidel@micron.com
office: 972-521-5271
cell: 214-578-8510

- Has there been any testing associated with Polly's automatic parallelization backend and
CLANG's native OpenMP layer + Intel RTL? [really Intel's RTL]

I'm not completely sure what the question is here. Are you asking
1) Can Polly auto-parallelization be used in conjunction with explicit OpenMP code?
or
2) Is there automatic testing going on that includes Polly with OpenMP?

Though, maybe it doesn't matter, since AFAIK the answer to either question is "No work has been done on this".

The question was in relation to the former. This might be an interesting area where I could run some tests and regressions over the winter break.

- I attended the OpenMP BOF at SC13 and noted the work on supporting an ARM port of the RTL.
Will there be a formal procedure in the future to support easier porting+integration+testing
of new architecture targets [or updated ports for existing targets]?

The procedure at the moment is the same as for any change.
Give us patches via openmprtl.org and we'll integrate them and push them out.

The whole issue of testing is a big hole at the moment :frowning:
We'd be very happy to integrate an OpenMP test suite if someone has one to contribute.
(We can't release the one Intel uses internally because it contains regression tests that have
extracts from customer codes which we cannot make public).

Not sure how much help I can be here outside of gathering what is publicly available. All the vendors have similar issues in this area…

- Has there been any thought given to decoupling the RTL backend threading/tasking
layer from the language + openmp compat layers? EG, it might be advantageous in the
future to support threading/tasking layers such as Qthreads. This especially makes
sense on targets with higher degrees of hardware parallelism such as Intel KNC, Convey MX, et al.

From the description of QThreads it doesn't seem immediately appropriate as a substrate for OpenMP.
OpenMP really wants to control the whole machine and have an OpenMP thread per logical CPU, whereas
QThreads says "The qthreads library on an SMP (i.e. the POSIX implementation) is essentially a library
for spawning and controlling coroutines: threads with small (4k) stacks."
OpenMP codes expect to have large stacks (good OpenMP parallelization is normally as high up the call tree
as can be achieved), and threads are expected to be persistent (see the rules about thread-local
storage in the standard). (This aside from QThreads having no implementation on Windows :-)).

The issue with WIndows is interesting. I often forget about non *NIX/BSD platform targets. As an aside though, Stephen Olivier [et al] did an impementation of Qthreads for the Rose compiler. There are some advantages for larger degrees of parallelism. See:
http://www.cs.unc.edu/~prins/RecentPubs/ross11.pdf