Not sure if my code is running on Intel Xe GPU

dtroncho · June 18, 2022, 6:17pm

Hello all,

I am compiling with clang 14.0.5 this code below. The only additional option I am using to compile is: /openmp

Apparently, it takes less time to compute than the equivalent serial CPU version, but, there are 2 things that make me think that it is not being executed on my Intel Xe GPU:

From the task manager, I do not see any activity on the GPU.
If I put, in the same function, this omp_get_num_devices(), this returns 0. By the way, then this omp_is_initial_device() returns true.

On the other side, the function omp_is_initial_device() that you can see in the code below returns false, indicating that the code is being executed in the GPU.

This is the code:

#pragma omp target teams distribute parallel for map(to:A[0:MAX_TEST * MAX_TEST],B[0:MAX_TEST * MAX_TEST]) map(tofrom: C, is_cpu, lBolChronoStarted, start, end)
for (int i = 0; i < MAX_TEST; i++)
for (int j = 0; j < MAX_TEST; j++)
for (int k = 0; k < MAX_TEST; k++)
{
if (!lBolChronoStarted)
{
start = clock();
lBolChronoStarted = true;
is_cpu = omp_is_initial_device();
}
C[i * MAX_TEST + j] += A[i * MAX_TEST + k] * B[k * MAX_TEST + j];
if (lBolChronoStarted)
if ((i == (MAX_TEST - 1)) && (j == (MAX_TEST - 1)) && (k == (MAX_TEST - 1)))
end = clock();
}

Hope someone can help me.

Thanks in advance and best regards.

shiltian · June 18, 2022, 9:00pm

I really doubt the code really runs on the GPU because LLVM OpenMP doesn’t support Intel GPU at all.

dtroncho · June 19, 2022, 7:40am

Thank you very much for your answer.

I have tried it with a GPU NVIDIA GeForce GTX 1060 and it did not work neither. Is that because I need to add any special compilation parameter for that particular GPU or NVIDIA GPUs in general?

In this document: Offloading Design & Internals — Clang 16.0.0git documentation

Says clang 15.0.0 is supporting offloading X86_64. Does that mean Intel’s Iris Xe GPU?

I look forward to your help. Have a nice day.

shiltian · June 19, 2022, 12:57pm

I have tried it with a GPU NVIDIA GeForce GTX 1060 and it did not work neither. Is that because I need to add any special compilation parameter for that particular GPU or NVIDIA GPUs in general?

Yes. -fopenmp -fopenmp-targets=<GPU target triple> is required when compiling an OpenMP program with target offloading. For Nvidia device, it would be -fopenmp -fopenmp-targets=nvptx64. That also requires when building clang, the default SM version is set to match your GPU; otherwise it would need another argument --offload-arch=sm_xx for the very recent trunk version, or -Xopenmp-target=nvptx64 -march=sm_xx for previous version.

Says clang 15.0.0 is supporting offloading X86_64. Does that mean Intel’s Iris Xe GPU?

No. I understand the name here is pretty confusing. It is for “host offloading”, which means CPU.

dtroncho · June 20, 2022, 5:30pm

Hello shiltian,

Thanks, you are really helping me. I need more help to achieve my objective of creating a DLL which offloads to NVIDIA GPUs.

I am using Windows 10 and MS Visual Studio C++ as IDE but with with compiler LLVM version 14.0.5. I have put only these compilation parameters: /openmp /openmp-target=nvptx64

And it does not compile, generating these errors:

|Error||could not open ‘x64\Release\dllmain.obj’: no such file or directory|
|Error||could not open ‘x64\Release\main.obj’: no such file or directory|
|Error||could not open ‘x64\Release\offloadable.obj’: no such file or directory|

I am compiling from a laptop which does not have an NVIDIA GPU, as I intend to generate the DLL for my final users not needing to have all the hardware that the DLL supports. The laptop on which I will try the DLL has an NVIDIA GeForce GTX 1060.

My questions are:

Any idea of the errors above?
Do I need to compile in the PC where the NVIDIA GPU is installed?

I look forward to your comments.

Thanks in advance and best regards.

dtroncho · June 20, 2022, 5:38pm

Sorry, I had written the 2nd parameter incorrectly. I have tried now with:

/openmp /openmp-targets=nvptx64

And same error.

If I compile with /openmp, it works, but does not offload to GPU, apparently.

Best regads.

shiltian · June 20, 2022, 6:05pm

Oh, you are on Windows. We don’t support offloading on Windows yet.

dtroncho · June 20, 2022, 7:03pm

ok. We also work with Ubuntu 20.0.4. Considering the indicated hardware, what would be the compilation parameters?

Thanks in advance.

jhuber6 · June 21, 2022, 3:14pm

As Shilei said, we don’t currently support OpenMP offloading to Windows. There are a few changes that need to happen in order for that to work. If you are able to set up a Linux machine with CUDA installed it should work if you have a version of LLVM with OpenMP offloading. Here’s a simple test file and how to compile it.

#include <omp.h>

int main() {
  int IsDevice = 1;
#pragma omp target map(from : IsDevice)
  { IsDevice = omp_is_initial_device(); }
  return IsDevice;
}

And compile with

$ clang test.c -fopenmp -fopenmp-targets=nvptx64
$ ./a.out && echo "success"

Alternatively you can check using libomptarget information via LIBOMPTARGET_INFO=-1

$ env LIBOMPTARGET_INFO=-1 ./a.out

Or you can use Nvidia tools

$ nvprof ./a.out

If you want to specify a specific architecture you can do use -Xopenmp-target=nvptx64 -march=sm_70 for old Clang. New (version >=15) clang you can just use --offload-arch=sm_70. Let me know if you have any other questions

Topic		Replies	Views
GPU offload with OpenMP Beginners	0	252	May 27, 2021
How can I use multiple GPUs in OpenMP? Clang Frontend gpu	21	2126	October 25, 2022
OpenMP GPU Target Offload in Clang OpenMP	3	171	August 21, 2018
OpenMP offloaded target region executed in both host and target-device Clang Frontend	2	93	April 9, 2018
Test with clang offloading to GPU OpenMP	3	111	October 21, 2019

Not sure if my code is running on Intel Xe GPU

Related topics