OpenMP offloaded target region executed in both host and target-device


I’m working on a project which requires OpenMP offloading to Nvidia GPUs using Clang.

System specification

OS - Ubuntu 16.04 LTS
Clang -version 4.00
Processor - Intel(R) Core™ i7 -4700MQ CPU
Cuda -version - 9.0
Nvidia GPU - GeForce 740M (sm_capability - 35)

But the problem is I when I execute a sample program to test OpenMP offloading to Nvidia GPUs, part of the target region tends to run in GPU and then same target region starts executing in the host.

Please find the sample program attached herewith, This a small C program written to multiply 2 matrices.
The reason to claim that target region is being executed in both host and target-device is due to the abnormal output received from the print function residing in the target region. (My processor has 4 cores capable of handling 2 hardware level threads per core.).

Please find the image of the command line output attached herewith.

the program was compiled with -

clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda

I can not figure out whether runtime believes that the GPU execution is not completing successfully?. So the target region is being executed in the host again.

Thank you!

2mm.c (1.25 KB)

Seems to me, your program crashes on GPU and then tries to execute the same code on cpu, though this behavior seems wrong to me.

The problem is in your code. When you try to map A, B and E array, you’re ding it it the wrong way. Instead of mapping the arrays you just map pointers to these arrays and do not allocate the memory for them on the GPU.


Thank you Alexey for pointing that out. I was able to successfully offload the program to GPU after correcting that mistake.