[RFC] Port RTL to openmp

A follow on to “does anyone know how to write this in opencl”, which sadly got no replies.

The current deviceRTL uses shared memory and arch specific parameters, intrinsics. It compiles as c++ if I comment out the shared and device attributes.

I’m proposing defining the shared variables with pragma omp target declare and rewriting the uses of volatile to follow c++ semantics. Expecting some clang work around address spaces and constructors.

Fairly big and controversial piece of work so I’m polling the list for opinions first. It’ll especially help anyone writing an openmp compiler who doesn’t have cuda or hip already implemented.

Anyone strongly for or against?

Jon

A follow on to "does anyone know how to write this in opencl", which sadly
got no replies.

You need to direct that question towards OpenCL ppl, I'm not one of
them though.

The current deviceRTL uses __shared__ memory and arch specific parameters,
intrinsics. It compiles as c++ if I comment out the shared and device
attributes.

I'm proposing defining the shared variables with pragma omp target declare
and rewriting the uses of volatile to follow c++ semantics. Expecting some
clang work around address spaces and constructors.

We need to give the users a way to declare shared (global and dynamic)
memory anyway so figuring this out is worth it for sure.

Fairly big and controversial piece of work so I'm polling the list for
opinions first. It'll especially help anyone writing an openmp compiler who
doesn't have cuda or hip already implemented.

Anyone strongly for or against?

I'm in favor with a caveat. An OpenMP implementation of the runtime was
on my agenda for the (far ahead) future anyway. However, we have a lot
of ongoing projects and I would postpone this *iff* we have a workaround
for the AMDGPU backend for now. If not, this is very reasonable.

Thanks,
  Johannes