Dear all,
in our downstream LLVM, we want to provide some kind of “partial shared virtual memory” support for OpenMP offloading to our hardware accelerator, and I would appreciate some hints on how to integrate with the OpenMP API + runtime.
A special situation for us is that we offload from a 64 bit host to 32 bit accelerators, and a further complication is that normal load + store accesses made by our our 32 bit cores to host RAM are slow, while we can transfer working data from host RAM to device scratchpad memory via DMA, and then accesses to device memory are much faster.
Due to the fact that we only have 32 bit pointers on the accelerators, we cannot provide true shared virtual memory. Instead, what we can roughly do is map the 4 GB address space of an accelerator to a part of the host address space. Data in this part of host RAM is what we want to make accessible via “partial shared virtual memory” on the device.
At first glance, it seems that we could make use of the following functions in order to make some part of host RAM accessible on our accelerators from a user programming model perspective:
void *llvm_omp_target_alloc_device(size_t size, int device_num);
void *llvm_omp_target_alloc_host(size_t size, int device_num);
void *llvm_omp_target_alloc_shared(size_t size, int device_num);
Would llvm_omp_target_alloc_host()
be reasonable here? This means, could we provide a special implementation of this, so that data allocated this way in host RAM would be mapped to the 32 bit address space of our device? As far as I understand, the host
variant would be appropriate, since data allocated this way “cannot migrate to the device”, yielding to slow accesses as for our above-mentioned hardware properties. But I do not fully understand the difference between the host
and shared
variants of the meaning of “migratable”.
Furthermore, we would have to implement corresponding support in libomptarget
and our target-dependent offload RTL. Has anyone provided a similar kind of shared virtual memory support for another target and could provide any hints on how this was roughly done?
Any hints would be greatly appreciated.