OpenMP in LLVM Multi-company Telecom Meeting Minutes Feb 12th

Next Meeting : Feb 26th

Opens:

  • Discussion on automatically detecting target pragmas in the code without user using command line option –fopenmp-targets=”…” and invoking the necessary host and target compilation.

Question was asked if there is documentation on how to use the existing –fopenmp-targets option. Alexey said there is a documentation explaining the option.

  • Francesco wanted to know the fate of VecClone pass which Intel had proposed. Intel will look into the feedback on the original proposal and see what is needed to commit the functionality.

  • Deepak asked about the issue with declaration conflicts in system header which is different for target and apps which uses the host header outside offload region.

This should be addressed when scoped declare variant is implemented.

Development Activity:

  • The libomptarget has switched to using C++14 from C++11 for building the libraries. Now it is same as LLVM and allows us to use LLVM ADT.

  • Support for multiple streams has been added so that multiple kernels can be executed concurrently. Each offload execution or data transfer is assigned to a stream in a round robin fashion.

At startup time 256 streams are created as default which can be controlled with an environment variable.

  • Sheila continued working on un-shackled thread to enable asynchronous offload.

  • Johannes has merged the OpenMPOpt transformation path. De-duplication of some openmp calls have been implemented.

Also attributed to the calls have been added, which could enable many optimization.

Implementation of “declare variant”

  • Parser part for declare variant has been posted for review.

  • Resolution for variant representation at call site has be agreed upon. Both the original function and the variant function will be represented in the AST

  • Next plan on prototyping support for dynamic declare variant to provide feedback to OpenMP where this issue is being discussed for 5.1

User-defined mapper function status

  • The functional implementation has been accepted.

  • Alexey wants the changes to be split into multiple patches. Lingda will be working on splitting and committing the changes.

DeviceRTL redesign to support sharing code

  • Trunk clang compiles for everything needed for AMD.

  • Can target specific code be written in inline IR as it is not supported by Clang.

Best to typeup a list of builtins needed and add them to clang.

Roll Call :

Quick update:

I looked in TR8 and unfortunately we do not specify a name for "streams". We only expose the device "context" and then you go from there.
Suggestions on how to rename the environment variable are welcome (see https://reviews.llvm.org/D74145#1871282 for context).

I think QUEUES or DEVICE_QUEUES sounds good. Queue is a "neutral" name (i.e. it doesn't come from any particular architecture) and quite descriptive at the same time. Nvidia's documentation itself describes the stream as a "sequence of operations", so it can be called a queue.

OpenCL also calls them queues: https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/clCreateCommandQueue.html
OpenCL is supposed to be the architecture-agnostic programming standard, so that's another plus in favor of "queue".

LIBOMPTARGET_NUM_QUEUES
LIBOMPTARGET_NUM_DEVICE_QUEUES
LIBOMPTARGET_NUM_COMMAND_QUEUES

George

“Queue” sounds better, although in many cases it can be out-of-order which is quite different from “stream”.

Regards,
Shilei