GPU kernels are called with the wrong number of arguments, deliberate?

The code appended compiles to a single kernel. It uses the data mapping logic. The kernel that expects one argument gets invoked by cuda’s rtl with two arguments. That looks wrong to me.

Is this a deliberate feature of target mapper, and if not, can anyone point me to roughly where to start looking?

Thanks,

Jon

Trailing test ompiled with:
~/llvm/bin/clang++ -O2 -target x86_64-pc-linux-gnu -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_50 test.cpp -o test -L/usr/local/cuda/targets/x86_64-linux/lib -lcudart -save-temps

This errors because __dummy.omp_offloading.entry isn’t marked weak, but ignoring that error for now because we still get the temp files out,

The device kernel has one argument:
llvm-dis test-openmp-nvptx64-nvidia-cuda.bc -o - | grep __omp_offloading

define weak void @__omp_offloading_802_9c0dba_main_l23([10 x i32*]* nonnull align 8 dereferenceable(80) %0) % ← one argument

It is passed to tgt_target_mapper,
llvm-dis test-host-x86_64-pc-linux-gnu.bc -o - | grep __omp_offloading

call i32 @__tgt_target_mapper(i64 -1, i8* @.__omp_offloading_802_9c0dba_main_l23.region_id, i32 2, ← arg num here …)

// Smallest example from aomp’s test suite that hits this case
#include <stdio.h>
#include <omp.h>
#include <stdint.h>

#define N 640
#define C 64
#define P 10

int A[N];
int *p[P];

int main()
{
int i;
for(i=0; i<N; i++) A[i] = i;
for(i=0; i<P; i++) p[i] = &A[i*C];

#pragma omp target enter data map(to: A) map(alloc: p)
for(i=0; i<P; i++) {
#pragma omp target enter data map(alloc: p[i][0:C])
}

#pragma omp target map(alloc: A, p)
{
int i, j;
for(i=0; i<P; i++) {
for(j=0; j<C; j++) {
p[i][j]++;
}
}
}

#pragma omp target update from( A)

int error = 0;
for(i=0; i<N; i++) {
if (A[i] != i+1) printf("%4d: got %d, expected %d, error %d\n", i, A[i], i+1, ++error);
}
printf(“completed TEST ARRAY with %d errors\n”, error);

return (error == 0)?0:1;
}

It certainly looks weird.

Here is the simplified version in which I cannot tell how we would determine in `libomptarget` how many arguments to pass on, the calls look all the same to me:

https://clang.godbolt.org/z/TdroxE