llvm OpenMP support for Apple Silicon

Hi,

I’m interested in exploring Apple Silicon performance with computational fluid dynamic codes, in particular codes that take advantage of OpenMP offload to accelerate performance. Are there any plans to support OpenMP offload to the integrated GPU in Apple Silicon processors?

Thanks.

Matt

Hi Matt,

not that I'm aware of any ongoing effort.

Though, I'm not an expert on Apple hardware (or software for that matter).
How would you program them right now?
Does LLVM have code generation capabilities for those GPUs?

Depending on the answers it might be reasonable to add OpenMP offloading support (or not).

~ Johannes

Johannes,

According to the Apple documentation, you access the GPU via their Metal API. Metal supports computation as well as graphics acceleration. Digging a little deeper, kernels that run on the GPU are written in Metal Shading Language, which is a variant of C++ designed for GPUs. The preparatory stuff to get the kernel running on the GPU is all done through a Metal Objective-C/C++ object which is tied to the particular device you want to use. With that object you get a reference to your kernel function(s), setup a pipeline (converts function to executable code), create a queue, create data buffers and load data, create a command buffer and encode commands, specify thread count and group size, and commit the queue to execute it. Somewhat similar to how SYCL works, at least from what I’ve read. Since Apple uses LLVM/Clang for their system compiler, I assume it is also used to compile for the GPU. Anyway, the short answer to the question appears to be OpenMP offload would need to be implemented using Apple’s Metal API.

Matt

Johannes,

According to the Apple documentation, you access the GPU via their Metal API. Metal supports computation as well as graphics acceleration. Digging a little deeper, kernels that run on the GPU are written in Metal Shading Language, which is a variant of C++ designed for GPUs. The preparatory stuff to get the kernel running on the GPU is all done through a Metal Objective-C/C++ object which is tied to the particular device you want to use. With that object you get a reference to your kernel function(s), setup a pipeline (converts function to executable code), create a queue, create data buffers and load data, create a command buffer and encode commands, specify thread count and group size, and commit the queue to execute it. Somewhat similar to how SYCL works, at least from what I’ve read. Since Apple uses LLVM/Clang for their system compiler, I assume it is also used to compile for the GPU. Anyway, the short answer to the question appears to be OpenMP offload would need to be implemented using Apple’s Metal API.

Yes. Given that description one would need two things:
1) A Metal API plugin for libomptarget (next to the CUDA and AMDGCN plugin). That is actually pretty easy if
their API is somewhat sane.
2) A backend that can produce device code. I don't think they upstreamed their Metal backend *but* one can
probably use their clang as "backend" given that it can consume LLVM-IR.

If you are interested in developing a protoype, feel free to reach out. We have some projects in the pipeline
that will make 2) much easier, plus experience with the plugin system.

~ Johannes

Johannes,

According to the Apple documentation, you access the GPU via their Metal API. Metal supports computation as well as graphics acceleration. Digging a little deeper, kernels that run on the GPU are written in Metal Shading Language, which is a variant of C++ designed for GPUs. The preparatory stuff to get the kernel running on the GPU is all done through a Metal Objective-C/C++ object which is tied to the particular device you want to use. With that object you get a reference to your kernel function(s), setup a pipeline (converts function to executable code), create a queue, create data buffers and load data, create a command buffer and encode commands, specify thread count and group size, and commit the queue to execute it. Somewhat similar to how SYCL works, at least from what I’ve read. Since Apple uses LLVM/Clang for their system compiler, I assume it is also used to compile for the GPU. Anyway, the short answer to the question appears to be OpenMP offload would need to be implemented using Apple’s Metal API.

Matt

Vulkan can be an alternative.

https://www.phoronix.com/scan.php?page=news_item&px=Vulkan-SDK-For-Apple
Ye