Not sure if my code is running on Intel Xe GPU

Hello all,

I am compiling with clang 14.0.5 this code below. The only additional option I am using to compile is: /openmp

Apparently, it takes less time to compute than the equivalent serial CPU version, but, there are 2 things that make me think that it is not being executed on my Intel Xe GPU:

  1. From the task manager, I do not see any activity on the GPU.
  2. If I put, in the same function, this omp_get_num_devices(), this returns 0. By the way, then this omp_is_initial_device() returns true.

On the other side, the function omp_is_initial_device() that you can see in the code below returns false, indicating that the code is being executed in the GPU.

This is the code:

#pragma omp target teams distribute parallel for map(to:A[0:MAX_TEST * MAX_TEST],B[0:MAX_TEST * MAX_TEST]) map(tofrom: C, is_cpu, lBolChronoStarted, start, end)
for (int i = 0; i < MAX_TEST; i++)
for (int j = 0; j < MAX_TEST; j++)
for (int k = 0; k < MAX_TEST; k++)
{
if (!lBolChronoStarted)
{
start = clock();
lBolChronoStarted = true;
is_cpu = omp_is_initial_device();
}
C[i * MAX_TEST + j] += A[i * MAX_TEST + k] * B[k * MAX_TEST + j];
if (lBolChronoStarted)
if ((i == (MAX_TEST - 1)) && (j == (MAX_TEST - 1)) && (k == (MAX_TEST - 1)))
end = clock();
}

Hope someone can help me.

Thanks in advance and best regards.

I really doubt the code really runs on the GPU because LLVM OpenMP doesn’t support Intel GPU at all.

Thank you very much for your answer.

I have tried it with a GPU NVIDIA GeForce GTX 1060 and it did not work neither. Is that because I need to add any special compilation parameter for that particular GPU or NVIDIA GPUs in general?

In this document: Offloading Design & Internals — Clang 15.0.0git documentation

Says clang 15.0.0 is supporting offloading X86_64. Does that mean Intel’s Iris Xe GPU?

I look forward to your help. Have a nice day.

I have tried it with a GPU NVIDIA GeForce GTX 1060 and it did not work neither. Is that because I need to add any special compilation parameter for that particular GPU or NVIDIA GPUs in general?

Yes. -fopenmp -fopenmp-targets=<GPU target triple> is required when compiling an OpenMP program with target offloading. For Nvidia device, it would be -fopenmp -fopenmp-targets=nvptx64. That also requires when building clang, the default SM version is set to match your GPU; otherwise it would need another argument --offload-arch=sm_xx for the very recent trunk version, or -Xopenmp-target=nvptx64 -march=sm_xx for previous version.

Says clang 15.0.0 is supporting offloading X86_64. Does that mean Intel’s Iris Xe GPU?

No. I understand the name here is pretty confusing. It is for “host offloading”, which means CPU.

Hello shiltian,

Thanks, you are really helping me. I need more help to achieve my objective of creating a DLL which offloads to NVIDIA GPUs.

I am using Windows 10 and MS Visual Studio C++ as IDE but with with compiler LLVM version 14.0.5. I have put only these compilation parameters: /openmp /openmp-target=nvptx64

And it does not compile, generating these errors:

|Error||could not open ‘x64\Release\dllmain.obj’: no such file or directory|
|Error||could not open ‘x64\Release\main.obj’: no such file or directory|
|Error||could not open ‘x64\Release\offloadable.obj’: no such file or directory|

I am compiling from a laptop which does not have an NVIDIA GPU, as I intend to generate the DLL for my final users not needing to have all the hardware that the DLL supports. The laptop on which I will try the DLL has an NVIDIA GeForce GTX 1060.

My questions are:

  • Any idea of the errors above?
  • Do I need to compile in the PC where the NVIDIA GPU is installed?

I look forward to your comments.

Thanks in advance and best regards.

Sorry, I had written the 2nd parameter incorrectly. I have tried now with:

/openmp /openmp-targets=nvptx64

And same error.

If I compile with /openmp, it works, but does not offload to GPU, apparently.

Best regads.

Oh, you are on Windows. We don’t support offloading on Windows yet.

ok. We also work with Ubuntu 20.0.4. Considering the indicated hardware, what would be the compilation parameters?

Thanks in advance.

As Shilei said, we don’t currently support OpenMP offloading to Windows. There are a few changes that need to happen in order for that to work. If you are able to set up a Linux machine with CUDA installed it should work if you have a version of LLVM with OpenMP offloading. Here’s a simple test file and how to compile it.

#include <omp.h>

int main() {
  int IsDevice = 1;
#pragma omp target map(from : IsDevice)
  { IsDevice = omp_is_initial_device(); }
  return IsDevice;
}

And compile with

$ clang test.c -fopenmp -fopenmp-targets=nvptx64
$ ./a.out && echo "success"

Alternatively you can check using libomptarget information via LIBOMPTARGET_INFO=-1

$ env LIBOMPTARGET_INFO=-1 ./a.out

Or you can use Nvidia tools

$ nvprof ./a.out

If you want to specify a specific architecture you can do use -Xopenmp-target=nvptx64 -march=sm_70 for old Clang. New (version >=15) clang you can just use --offload-arch=sm_70. Let me know if you have any other questions

1 Like