LLVM OpenMP update

I have just committed at revision 219214.

The details are below.

This code has been checked on X86 and IBM Power.

– Jim

James Cownie james.h.cownie@intel.com
SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)

Tel: +44 117 9071438

I apologise in advance for the size of this check-in. At Intel we do

understand that this is not friendly, and are working to change our

internal code-development to make it easier to make development

features available more frequently and in finer (more functional)

chunks. Unfortunately we haven’t got that in place yet, and unpicking

this into multiple separate check-ins would be non-trivial, so please

bear with me on this one. We should be better in the future.

Apologies over, what do we have here?

GGC 4.9 compatibility

  • We have implemented the new entrypoints used by code compiled by GCC

4.9 to implement the same functionality in gcc 4.8. Therefore code

compiled with gcc 4.9 that used to work will continue to do so.

However, there are some other new entrypoints (associated with task

cancellation) which are not implemented. Therefore user code compiled

by gcc 4.9 that uses these new features will not link against the LLVM

runtime. (It remains unclear how to handle those entrypoints, since

the GCC interface has potentially unpleasant performance implications

for join barriers even when cancellation is not used)

— new parallel entry points —

new entry points that aren’t OpenMP 4.0 related

These are implemented fully :-







— cancellation entry points —

Currently, these only give a runtime error if OMP_CANCELLATION is true

because our plain barriers don’t check for cancellation while waiting






— taskgroup entry points —

These are implemented fully.



— target entry points —

These are empty (as they are in libgomp)






Improvements in Barriers and Fork/Join

  • Barrier and fork/join code is now in its own file (which makes it

easier to understand and modify).

  • Wait/release code is now templated and in its own file; suspend/resume code is also templated

  • There’s a new, hierarchical, barrier, which exploits the

cache-hierarchy of the Intel(r) Xeon Phi™ coprocessor to improve

fork/join and barrier performance.

BEWARE the new source files have not been added to the legacy

Cmake build system. If you want to use that fixes wil be required.

Statistics Collection Code

  • New code has been added to collect application statistics (if this

is enabled at library compile time; by default it is not). The

statistics code itself is generally useful, the lightweight timing

code uses the X86 rdtsc instruction, so will require changes for other


The intent of this code is not for users to tune their codes but


  1. For timing code-paths inside the runtime

  2. For gathering general properties of OpenMP codes to focus attention

on which OpenMP features are most used.

Nested Hot Teams

  • The runtime now maintains more state to reduce the overhead of

creating and destroying inner parallel teams. This improves the

performance of code that repeatedly uses nested parallelism with the

same resource allocation. Set the new KMP_HOT_TEAMS_MAX_LEVEL

envirable to a depth to enable this (and, of course, OMP_NESTED=true

to enable nested parallelism at all).

Improved Intel(r) VTune™ Amplifier support

  • The runtime provides additional information to Vtune via the

itt_notify interface to allow it to display better OpenMP specific

analyses of load-imbalance.

Support for OpenMP Composite Statements

  • Implement new entrypoints required by some of the OpenMP 4.1

composite statements.

Improved ifdefs

  • More separation of concepts (“Does this platform do X?”) from

platforms (“Are we compiling for platform Y?”), which should simplify

future porting.

ScaleMP* contribution

Stack padding to improve the performance in their environment where

cross-node coherency is managed at the page level.

Redesign of wait and release code

The code is simplified and performance improved.

Bug Fixes

*Fixes for Windows multiple processor groups.

*Fix Fortran module build on Linux: offload attribute added.

*Fix entry names for distribute-parallel-loop construct to be consistent with the compiler codegen.

*Fix an inconsistent error message for KMP_PLACE_THREADS environment variable.

Don’t do a blob commit like this again.

  1. No review (If I missed something I apologize)
  2. No other LLVM project would accept this shit
  3. No QA - from our (PathScale) internal testing during a patch review process or proof that it was tested against llvm (I could give 2 shits about gcc-4.9 - Is there an LLVM build bot or automated testing setup?)
  4. There is already a hard time getting OpenMP into clang/llvm… lets not give the trolls any reason to point fingers or complain further

I’m happy there is an effort by Intel to get things “upstream”, but if it happens again I will use whatever community voice I have to freeze the commit access of the person who pushes it.


  1. No QA - from our (PathScale) internal testing during a patch review process or proof that it was tested against llvm (I could give 2 shits about gcc-4.9 - Is there an LLVM build bot or automated testing setup?)

Of course there are buildbots!


As for tests, UoH works hard to add their OpenUH test suite to libiomp testing – which proved to be not easy without OpenMP-enabled compiler available in clang trunk [yet]. But AFAIK, they made some real progress here – stay tuned.

BTW, if you have OpenMP test suites that you can open-source, you are welcome to add them to the project – and thus, improve QA coverage!


Thank you for your reply

That build bot looks like it's only doing a build and nothing else. I can't
drill down into any specific tests for example. I care less about automated
and public testing infrastructure than I do about the ability review each
and every patch/commit (BEFORE) it's been pushed. Especially stuff which
does not meet any coding or commit guidelines by any llvm project.

I think my 1st message is clear