TL;DR
D130224 : Add the ability to attach attribute((maybe_undef)) on function arguments. Clang will then remove noundef attribute and introduces a freeze instruction on the argument.
Motivation
-
Languages like CUDA, HIP etc. have APIs which accept uninitialized function arguments. In these languages, some functions have multi-threaded semantics, and it is enough for only one or some threads to provide defined arguments. Depending on semantics, undef arguments in some threads don’t produce undefined results in the function call.
-
With D105169, we are forced to assume very strict constraints for the languages. Noundef attribute has been added by default to all function arguments and return values.
CUDA/HIP have intrinsics like “__shfl_sync(unsigned mask, T var, int srcLane, int width=warpSize)”, Where the exchange of variable occurs simultaneously for all active threads within the warp. So, in the cuda/hip kernel, variable var in shuffl_sync may not be initialised. Clang adds noundef attribute to the argument and LLVM IR treats it as undef which is a contradiction.
For the below source kernel, the optimisation by simplifyCFG pass caused issue with kernel execution on AMDGPU. This optimisation removes undef introducing predecessor because clang has by default marked the shf_sync arguments with noundef attribute. The resulting IR doesn’t exactly match logically with initial cuda/hip kernel.
Source kernel:
Note: variable t is uninitialized initially and gets initialized when lane is 0.
void kernel{
double t, measure_row;
int lane = hypre_cuda_get_lane_id<1>();
…
if (lane == 0) {t = read_only_load(measure_diag + row);}
measure_row = __shfl_sync(HYPRE_WARP_FULL_MASK, t, 0);
…
}
LLVM IR:
define void @kernel(i32 noundef %arg17) {
bb1:
%i1 = icmp eq i32 %arg17, 0
br i1 %i1, label %bb2, label %bb3
bb2: ; preds = %bb1
%i2 = call noundef double @read_only_load()
br label %bb3
bb3: ; preds = %bb2, %bb1
%i3 = phi double [ %i2, %bb2 ], [ undef, %bb1 ]
%i4 = call noundef double @__shfl_sync(double noundef %i3)
ret void
}
declare double @read_only_load()
declare double @__shfl_sync(double noundef) convergent
IR Dump After SimplifyCFGPass:
define void @kernel(i32 noundef %arg17) {
bb1:
%i1 = icmp eq i32 %arg17, 0
call void @llvm.assume(i1 %i1)
%i2 = call noundef double @read_only_load()
%i4 = call noundef double @__shfl_sync(double noundef %i2)
ret void
}
Solution
We went through multiple solutions to address the issue:
→ D124158 : Skip adding noundef attribute to arguments when function has convergent attribute
→ D125378 : Introduce shuffle attribute to be used for __shfl_sync like cross-lane APIs
→ D128907 : Disable noundef attribute for languages which allow uninitialized function arguments
After considering the feedback, we are proposing below solution to solve this issue:
D130224 : Introduce maybe_undef attribute for function arguments which accepts undef values.
Now, for __shfl_sync like cross-lane apis, “maybe_undef” attribute can be added to parameter which is known to be left uninitialized in the kernel. Clang can identify the function argument and skip adding noundef attribute to it. Freeze llvm ir instruction will be introduced at clang codegen for this function parameter to stop propagation of undef.
Implementation
-
“maybe_undef” attribute will be enabled in clang. It can be applied to function argument.
-
In CodeGenModule::ConstructAttributeList, while constructing IR attribute list for function or call, when “maybe_undef” attribute is found on function argument, noundef attribute will be skipped.
-
In CodeGenFunction::EmitCall, while translating the arguments necessary to match IR lowering, freeze instruction is introduced when “maybe_undef” attribute is found on the argument.
Thanks,
Chaitanya.