Recently CUDA introduced a new memory space specifier __grid_constant__ (CUDA C++ Programming Guide). This RFC discusses how to support it with HIP.
The specifier is for const qualified non-reference type kernel parameters only. Its main purpose is to avoid stack copy of the kernel parameter when its address is taken.
HIP kernel arguments are passed through a buffer in constant address space. HIP runtime allocates the buffer in GPU memory and stores the kernel arguments in it. The compiler backend accesses the kernel arguments through a target specific LLVM intrinsic which returns the address of the buffer. Since it is undefined behaviour to change the value of a const qualified variable, the compiler backend can assume the buffer always contains the value of the const qualified kernel argument and use its address in place of the address of the stack copy and eliminate the stack copy. This could be done for any const- qualified kernel arguments. Then __grid_constant__ can be defined as empty.
A warning can be emitted to indicate that __grid_constant__ is ignored.
For codegen purposes, I don’t really see why this was added. For arguments lowered with byref, we get this behavior as-is after optimizations as long as you didn’t modify the argument. I guess it’s a way to enforce the program never does that? I don’t see why we would need a warning that it was ignored.
Agree. We already have such optimization for byref kernel args. We will not lose anything by ignoring this attribute if we handle all const kernel arguments. A warning is unnecessary.
We can’t always guarantee that we do not modify the data via a taken pointer and it’s a fairly common source of pain. The code that decides whether we need a copy is rather naive and only handles a limited number of “safe” cases. It currently can’t deal with phis or passing the pointer to another function as a byval, which is the current calling convention for the aggregates. I have a pending patch to change the convention specifically for this reason.
I didn’t look at what exactly __grid_constant__ does in NVCC, but, considering that it only applies to const-qualified arguments of __global__ functions, I suspect its impact would be somewhat limited, as the issue with local copies of aggregate arguments affects all functions.
On second thoughts, I kind of understand the difference between __grid_constant__ and const. A program can cast away the constness of a const parameter and modify its value, and the modification usually works as a const parameter is on stack, e.g. Compiler Explorer . However, a __grid_constant__ parameter is guaranteed to be immutable and modifying it will not have an effect.
I am just not sure whether it is necessary to introduce __grid_constant__ explicitly instead of making all const kernel parameters implicitly __grid_constant__. In other words, I am not sure how commonly users modify a const kernel parameter and expect it to be truly modified. If it is a common use case, then probably we do need to introduce __grid_constant__.
Another consideration is representing __grid_constant__ in LLVM IR. Since it is different from const, even if we do not need it in the source code, we may still need a new LLVM function parameter attribute to represent it.
We don’t need a new IR attribute because this is already how we treat kernel arguments. Is the difference just what taking the address of a kernel argument gives you? We also should move to more consistently using byref for all types.
If you use -O3 with clang (Compiler Explorer), clang actually passes the original pointer to known_funcs. However, there are still useless stores to stack, which may be eliminated.
hip-clang seems to be able to eliminate the useless stores to stack (Compiler Explorer)