We are developing a tracing tool relying on OMPT callbacks which works quite good for tasks and parallel regions but not that for target regions:
0: Could not register callback 'ompt_callback_device_initialize'
0: Could not register callback 'ompt_callback_device_load'
0: Could not register callback 'ompt_callback_target'
After looking at libomp and libomptarget code, it seems all target-related data structures are implemented on the OMPT sides but not the callbacks on libomptarget side. We are considering implementing it then submitting a patch but I would like to know if someone in the openmp community is already looking at it? Indeed, we don’t want to reimplement something that is already available but not yet upstream.
We have implemented a draft of all of libomptarget OpenMP 5.0 device support here: https://github.com/jmellorcrummey/llvm-openmp-5
The default branch is openmp5-gpu, which includes the device-independent support in libomptarget as well as device-dependent support for NVIDIA GPUs.
While we merged changes from LLVM OpenMP recently (over the summer), our version is likely not completely up to date with changes in LLVM OpenMP.
The only difference between what we have implemented and the OpenMP 5.0 standard is that we changed the implementation to provide begin/end pairs for ompt_callback_target_submit and ompt_callback_target_data_op callbacks, as described in the attached document, which we proposed as a change for OpenMP 5.1. There is a discussion of this suggested change vs. alternative ways of supporting mixed programming models (e.g. CUDA+OpenMP5, HIP+OpenMP5, SYCL+OpenMP5) on the OpenMP tools telecon today 1.25 hours from now.
Besides the libomptarget support, the repository above also has fixes for handling OMPT frames for call stack introspection. We need to separate the libomptarget work from the rest and submit them upstream as separate pull requests.
openmp-changes-diffs.pdf (226 KB)
Thank you very much for the quick answer!
We are going to integrate your changes to our repository to allow us to profile OMP targets until it gets upstreamed.
Do any of you have a working LLVM runtime with OMPT for offloading completed? We tried both
https://github.com/OpenMPToolsInterface/llvm-project/tree/openmp5-gpu/openmp and there are some errors see below. It seems some flag is off.
0: Could not register callback ‘ompt_callback_device_initialize’
0: Could not register callback ‘ompt_callback_device_finalize’
0: Could not register callback ‘ompt_callback_device_load’
0: Could not register callback ‘ompt_callback_device_unload’
0: Could not register callback ‘ompt_callback_target’
0: Could not register callback ‘ompt_callback_target_map’
0: Could not register callback ‘ompt_callback_target_data_op’
0: Could not register callback ‘ompt_callback_target_submit’