Struct passing in CUDA (targeting NVPTX)

Hello,

When I use clang to compile a CUDA kernel to target NVPTX, any struct argument of a CUDA function will be compiled down to an llvm function which has a separate argument for each field of the struct. For example,

struct s {
int a;
int b;
};

__attribute__((global)) void foo(struct s arg) {
}

compiled with:

clang -emit-llvm -c -target nvptx-- -x cuda -Xclang -fcuda-is-device

results in:

%struct.s = type { i32, i32 }

; Function Attrs: nounwind
define void @_Z3foo1s(i32 %arg.coerce0, i32 %arg.coerce1) #0 {
entry:
%arg = alloca %struct.s, align 4
%0 = getelementptr %struct.s* %arg, i32 0, i32 0
store i32 %arg.coerce0, i32* %0
%1 = getelementptr %struct.s* %arg, i32 0, i32 1
store i32 %arg.coerce1, i32* %1
ret void
}

Without going into details, this code is a bit suboptimal for a static analysis I’m trying to develop and which is supposed to work on the llvm bitcode. Instead I would like to obtain:

%struct.s = type { i32, i32 }

; Function Attrs: nounwind
define void @_Z3foo1s(%struct.s %s) #0 {
entry:
ret void
}

Would this be possible (possibly by changing clang)?

Reading up on the handling of struct in llvm the expansion seem a property of the NVPTX ABI. However, looking at the NVPTX parts of clang/lib/CodeGen/TargetInfo.cpp and clang/lib/Basic/Targets.cpp, I don’t see anything which obviously affects multiple arguments being generated.

I don’t mind breaking the ABI, I’m only interested in obtaining llvm bitcode and not in code generation for the target.

Thanks,

Jeroen