We, the engineers working in Codeplay Software and Intel on Unified Runtime, are interested in this direction.
Unified Runtime’s primary design goal is to provide a useful library for implementing high-level offload languages, initially SYCL and OpenMP. The specification and implementation are open source and developed in the open. A key consideration for us is to enable interoperation between language runtimes when used in the same process.
With respect to the immediate request in the RFC, the renaming is a good idea, an interface for cross-language use should have a generic name. The presence of OpenMP, or omp, in interface names has caused confusion before.
I can talk more about features, design, and future plans, but for now I want
to get a feeling if there is any opposition to such a move/rename. Please feel
free to ask questions.
We are keen to be involved in these discussions and feel we have a lot to contribute, especially with regard to the requirements for supporting SYCL. Those discussions can happen in future RFCs.
Over the last few months we’ve undertaken a detailed analysis of differences between Unified Runtime and libomptarget plugin interface, from two directions:
- Implement a libomptarget plugin interface plugin on top of Unified Runtime, focusing on Intel Level Zero support.
- Implement a “proof of concept” Unified Runtime plugin on top of the libomptarget plugin interface, focusing on CUDA support.
For 1. we expect the libomptarget plugin interface to be fully functional, as we did not find any gaps in functionality to support OpenMP on top of Unified Runtime. What this means in practice is that you can run OpenMP both on the existing libomptarget plugins, and the plugins that were initially created for the DPC++ SYCL implementation. We have not yet done a meaningful comparison of aspects such as performance.
For 2. we found a number of gaps and will gladly share our findings in full but will avoid going into too much detail for now. These issues are mainly due to the larger footprint of SYCL compared to OpenMP offload. The big ticket items are:
- Limited kernel scheduling, unable to launch multi-dimensional kernels
- Unable to change local/global work sizes at runtime
- Limited support for multiple-devices at runtime
- No support for images
- Inflexible compilation flow & unable to link device programs at runtime
- Lack of information when an error occurs
- No way to programmatically query support for features or object introspection
Conversely, the features below are not currently supported by Unified Runtime.
it has many extensions for things like remote machines or a “virtual” GPU
running on the host.
Both libomptarget plugin interface and Unified Runtime have support for some of the same driver targets. There is duplication of work happening here which could certainly be avoided moving forwards.
The Unified Runtime project uses the Apache License v2.0 with LLVM Exceptions. As such, if the community is open to it, we are willing to contribute Unified Runtime in full. We are also open to alternatives where we work towards convergence between Unified Runtime and libomptarget plugin interface in LLVM directly. The parallel RFCs being discussed about supporting SYCL will require an interface for mapping to heterogeneous driver APIs.