[Non-DoD Source] Re: [llvm-dev] llvm OpenMP support for Apple Silicon

Elided the discussion starting:

Are there any plans to support OpenMP
offload to the integrated GPU in Apple Silicon processors?

Replying from Johannes’ sketch:
Given that description one would need two things:

  1. A Metal API plugin for libomptarget (next to the CUDA and AMDGCN
    plugin). That is actually pretty easy if
    their API is somewhat sane.
  2. A backend that can produce device code. I don’t think they upstreamed
    their Metal backend but one can
    probably use their clang as “backend” given that it can consume LLVM-IR.

I think this is in the realm of difficult but tractable. One scheme would be to add a target to LLVM which approximates the GPU - pointer sizes etc - then a heuristically derived pass that mangles the whole-program IR (post device lib splice) into IR that the closed source toolchain tolerates. Could alternatively use an existing target that looks vaguely similar and do more work in the heuristic pass.

An alternative would be to revive a C backend for LLVM and use that as the common IR. Essentially need ‘something’ that exists after the openmp-specific rewrites and preferably after the device runtime that can be fed to ‘something’ that creates binaries that the GPU can run.

There’s some independent interest in passing IR through to the API plugin layer and if Intel’s GPU compiler stays out of tree we might end up with machinery for feeding an IR to a third party toolchain to ‘finalize’ it.

Interesting problem, not immediately obvious who would have the motivation to make it happen. Could be a side project for an Apple GPU enthusiast. Whole thing would be much easier if the backend was added to LLVM trunk, presumably by Apple.

Thanks for the interesting question!

Jon

Elided the discussion starting:

Are there any plans to support OpenMP
offload to the integrated GPU in Apple Silicon processors?

Replying from Johannes' sketch:
  Given that description one would need two things:

1) A Metal API plugin for libomptarget (next to the CUDA and AMDGCN
plugin). That is actually pretty easy if
     their API is somewhat sane.
2) A backend that can produce device code. I don't think they upstreamed
their Metal backend *but* one can
     probably use their clang as "backend" given that it can consume
LLVM-IR.

I think this is in the realm of difficult but tractable. One scheme would
be to add a target to LLVM which approximates the GPU - pointer sizes etc -
then a heuristically derived pass that mangles the whole-program IR (post
device lib splice) into IR that the closed source toolchain tolerates.
Could alternatively use an existing target that looks vaguely similar and
do more work in the heuristic pass.

An alternative would be to revive a C backend for LLVM and use that as the
common IR. Essentially need 'something' that exists after the
openmp-specific rewrites and preferably after the device runtime that can
be fed to 'something' that creates binaries that the GPU can run.

There's some independent interest in passing IR through to the API plugin
layer and if Intel's GPU compiler stays out of tree we might end up with
machinery for feeding an IR to a third party toolchain to 'finalize' it.

Interesting problem, not immediately obvious who would have the motivation
to make it happen. Could be a side project for an Apple GPU enthusiast.
Whole thing would be much easier if the backend was added to LLVM trunk,
presumably by Apple.

The problem is that I might have to do almost all of this for the
Intel case... so why not start here.

~ Johannes