amdgcn / deviceRTL status update

Hello OpenMP,

The downstream aomp (roughly our llvm fork, on github) version of the amdgcn deviceRTL is now self contained. Well, it depends on clang supporting the HIP language, but not on any runtime libraries. It took a little while to stamp out the last few references to amdgpu libraries but they’re all gone now.

There are a couple of (just opened) patches that fill in some missing symbols for llvm master’s amdgcn deviceRTL, with another couple to follow. A crude version of malloc is available now or we can wait for a cleverer version that another team is working on. My next downstream task is to improve the locks implementation.

One proviso, hip compiler support is not quite sufficient to build everything from C++ source today. So aomp has some clang patches and some IR source files. I’ll need to do some upstream compiler work shortly.

My goal here is a self contained deviceRTL that compiles with clang master. Just link against openmp device code and run. This leaves the host plugin - which I think is being improved by a colleague - and clang support remaining to wire source code to the runtime.

I have a mashup of the ubuntu provided amdkfd driver + upstream deviceRTL + the missing symbols above + aomp host runtime + aomp clang passing about half our tests locally. Arithmetic on a gpu, no closed source components in the stack. I’d like to see the descendent of this shipping with linux distros.

Thanks all for making this possible