Not sure if my code is running on Intel Xe GPU

Hello all,

I am compiling with clang 14.0.5 this code below. The only additional option I am using to compile is: /openmp

Apparently, it takes less time to compute than the equivalent serial CPU version, but, there are 2 things that make me think that it is not being executed on my Intel Xe GPU:

  1. From the task manager, I do not see any activity on the GPU.
  2. If I put, in the same function, this omp_get_num_devices(), this returns 0. By the way, then this omp_is_initial_device() returns true.

On the other side, the function omp_is_initial_device() that you can see in the code below returns false, indicating that the code is being executed in the GPU.

This is the code:

#pragma omp target teams distribute parallel for map(to:A[0:MAX_TEST * MAX_TEST],B[0:MAX_TEST * MAX_TEST]) map(tofrom: C, is_cpu, lBolChronoStarted, start, end)
for (int i = 0; i < MAX_TEST; i++)
for (int j = 0; j < MAX_TEST; j++)
for (int k = 0; k < MAX_TEST; k++)
{
if (!lBolChronoStarted)
{
start = clock();
lBolChronoStarted = true;
is_cpu = omp_is_initial_device();
}
C[i * MAX_TEST + j] += A[i * MAX_TEST + k] * B[k * MAX_TEST + j];
if (lBolChronoStarted)
if ((i == (MAX_TEST - 1)) && (j == (MAX_TEST - 1)) && (k == (MAX_TEST - 1)))
end = clock();
}

Hope someone can help me.

Thanks in advance and best regards.

I really doubt the code really runs on the GPU because LLVM OpenMP doesnā€™t support Intel GPU at all.

Thank you very much for your answer.

I have tried it with a GPU NVIDIA GeForce GTX 1060 and it did not work neither. Is that because I need to add any special compilation parameter for that particular GPU or NVIDIA GPUs in general?

In this document: Offloading Design & Internals ā€” Clang 16.0.0git documentation

Says clang 15.0.0 is supporting offloading X86_64. Does that mean Intelā€™s Iris Xe GPU?

I look forward to your help. Have a nice day.

I have tried it with a GPU NVIDIA GeForce GTX 1060 and it did not work neither. Is that because I need to add any special compilation parameter for that particular GPU or NVIDIA GPUs in general?

Yes. -fopenmp -fopenmp-targets=<GPU target triple> is required when compiling an OpenMP program with target offloading. For Nvidia device, it would be -fopenmp -fopenmp-targets=nvptx64. That also requires when building clang, the default SM version is set to match your GPU; otherwise it would need another argument --offload-arch=sm_xx for the very recent trunk version, or -Xopenmp-target=nvptx64 -march=sm_xx for previous version.

Says clang 15.0.0 is supporting offloading X86_64. Does that mean Intelā€™s Iris Xe GPU?

No. I understand the name here is pretty confusing. It is for ā€œhost offloadingā€, which means CPU.

Hello shiltian,

Thanks, you are really helping me. I need more help to achieve my objective of creating a DLL which offloads to NVIDIA GPUs.

I am using Windows 10 and MS Visual Studio C++ as IDE but with with compiler LLVM version 14.0.5. I have put only these compilation parameters: /openmp /openmp-target=nvptx64

And it does not compile, generating these errors:

|Error||could not open ā€˜x64\Release\dllmain.objā€™: no such file or directory|
|Error||could not open ā€˜x64\Release\main.objā€™: no such file or directory|
|Error||could not open ā€˜x64\Release\offloadable.objā€™: no such file or directory|

I am compiling from a laptop which does not have an NVIDIA GPU, as I intend to generate the DLL for my final users not needing to have all the hardware that the DLL supports. The laptop on which I will try the DLL has an NVIDIA GeForce GTX 1060.

My questions are:

  • Any idea of the errors above?
  • Do I need to compile in the PC where the NVIDIA GPU is installed?

I look forward to your comments.

Thanks in advance and best regards.

Sorry, I had written the 2nd parameter incorrectly. I have tried now with:

/openmp /openmp-targets=nvptx64

And same error.

If I compile with /openmp, it works, but does not offload to GPU, apparently.

Best regads.

Oh, you are on Windows. We donā€™t support offloading on Windows yet.

ok. We also work with Ubuntu 20.0.4. Considering the indicated hardware, what would be the compilation parameters?

Thanks in advance.

As Shilei said, we donā€™t currently support OpenMP offloading to Windows. There are a few changes that need to happen in order for that to work. If you are able to set up a Linux machine with CUDA installed it should work if you have a version of LLVM with OpenMP offloading. Hereā€™s a simple test file and how to compile it.

#include <omp.h>

int main() {
  int IsDevice = 1;
#pragma omp target map(from : IsDevice)
  { IsDevice = omp_is_initial_device(); }
  return IsDevice;
}

And compile with

$ clang test.c -fopenmp -fopenmp-targets=nvptx64
$ ./a.out && echo "success"

Alternatively you can check using libomptarget information via LIBOMPTARGET_INFO=-1

$ env LIBOMPTARGET_INFO=-1 ./a.out

Or you can use Nvidia tools

$ nvprof ./a.out

If you want to specify a specific architecture you can do use -Xopenmp-target=nvptx64 -march=sm_70 for old Clang. New (version >=15) clang you can just use --offload-arch=sm_70. Let me know if you have any other questions

1 Like