I’m working on a project which requires OpenMP offloading to Nvidia GPUs using Clang.
OS - Ubuntu 16.04 LTS
Clang -version 4.00
Processor - Intel(R) Core™ i7 -4700MQ CPU
Cuda -version - 9.0
Nvidia GPU - GeForce 740M (sm_capability - 35)
But the problem is I when I execute a sample program to test OpenMP offloading to Nvidia GPUs, part of the target region tends to run in GPU and then same target region starts executing in the host.
Please find the sample program attached herewith, This a small C program written to multiply 2 matrices.
The reason to claim that target region is being executed in both host and target-device is due to the abnormal output received from the print function residing in the target region. (My processor has 4 cores capable of handling 2 hardware level threads per core.).
Please find the image of the command line output attached herewith.
the program was compiled with -
clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda
I can not figure out whether runtime believes that the GPU execution is not completing successfully?. So the target region is being executed in the host again.
2mm.c (1.25 KB)