Tl;dr
We should extract device specific code out of the OpenMP deviceRTL such that we can reuse the common logic (>90%) for all devices.
We also need to improve the documentation and we should think about bringing the code into the LLVM coding style.
Requested changes:
I would like is to change the OpenMP device runtime library design (openmp/libomptarget/deviceRTLs) towards the following goals:
1) Allow reuse of common logic between different devices in a clean and extensible way.
2) Improve the documentation, e.g., doxygen comments and code comments, for the code.
3) Follow the same coding style as LLVM core.
Disclaimer:
First, I do not want to say it currently is impossible the reuse the code for other devices or the code is not documented at all. What I
think is that we can improve both substantially if we choose to do so. Also, a change in coding style is easier now than later, so if we
decide to do refactoring, that can be included without adding to much churn.
Motivation:
Now we can discuss if we should do any of the proposed changes but I guess most of them have clear benefits. I am also not the first
to suggest them. Point 1 was mentioned with the initial drop of the device runtime [0], but it was rejected for time reasons. Point 2
was recently discussed as a pressing issue in multiple reviews. Point 3 is a general observation as writing and reviewing code for the
openmp sub project is unnecessarily hard for LLVM developers due to the different coding style.
Proposed structure:
In order to ease the reuse by new devices we should have a common core with device independent logic, e.g., in
openmp/ibomptarget/deviceRTLs/common
including an interface that declares all device specific methods needed by the core logic. The interface is then the
only thing implemented in the device subfolders, e.g.,
openmp/ibomptarget/deviceRTLs/nvptx, openmp/ibomptarget/deviceRTLs/amdgpu, ...
To get to this goal, all device specific code has to be extracted from the core logic. The prototypes below show that this
is fairly easy to do.
Feasibility and prototypes:
To showcase the direction I would like is to move to I "redesigned" three files (out of ~20) with the above goals in mind. The patches
can be found here:
⚙ D64217 [OpenMP][NFCI] Cleanup the target state queue implementation
https://reviews.llvm.org/D64218
⚙ D64219 [OpenMP][NFCI] Cleanup the target worksharing implementation
Note that there is a vast design space even if we agree to the above three goals. As a consequence, I'd like us to use the patches to
discuss general design decisions not specific ones until we agreed on a path forward.
Please let me know what you think,
Johannes
[0] ⚙ D14254 [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs.