[llvm-dev] [RFC] : [D130224] Introduce maybe_undef attribute for function arguments which accepts undef values

TL;DR

D130224 : Add the ability to attach attribute((maybe_undef)) on function arguments. Clang will then remove noundef attribute and introduces a freeze instruction on the argument.

Motivation

  • Languages like CUDA, HIP etc. have APIs which accept uninitialized function arguments. In these languages, some functions have multi-threaded semantics, and it is enough for only one or some threads to provide defined arguments. Depending on semantics, undef arguments in some threads don’t produce undefined results in the function call.

  • With D105169, we are forced to assume very strict constraints for the languages. Noundef attribute has been added by default to all function arguments and return values.

CUDA/HIP have intrinsics like “__shfl_sync(unsigned mask, T var, int srcLane, int width=warpSize)”, Where the exchange of variable occurs simultaneously for all active threads within the warp. So, in the cuda/hip kernel, variable var in shuffl_sync may not be initialised. Clang adds noundef attribute to the argument and LLVM IR treats it as undef which is a contradiction.

For the below source kernel, the optimisation by simplifyCFG pass caused issue with kernel execution on AMDGPU. This optimisation removes undef introducing predecessor because clang has by default marked the shf_sync arguments with noundef attribute. The resulting IR doesn’t exactly match logically with initial cuda/hip kernel.

Source kernel:
Note: variable t is uninitialized initially and gets initialized when lane is 0.

void kernel{
double t, measure_row;
int lane = hypre_cuda_get_lane_id<1>();

if (lane == 0) {t = read_only_load(measure_diag + row);}
measure_row = __shfl_sync(HYPRE_WARP_FULL_MASK, t, 0);

}

LLVM IR:
define void @kernel(i32 noundef %arg17) {
bb1:
%i1 = icmp eq i32 %arg17, 0
br i1 %i1, label %bb2, label %bb3

bb2: ; preds = %bb1
%i2 = call noundef double @read_only_load()
br label %bb3

bb3: ; preds = %bb2, %bb1
%i3 = phi double [ %i2, %bb2 ], [ undef, %bb1 ]
%i4 = call noundef double @__shfl_sync(double noundef %i3)
ret void
}
declare double @read_only_load()
declare double @__shfl_sync(double noundef) convergent

IR Dump After SimplifyCFGPass:
define void @kernel(i32 noundef %arg17) {
bb1:
%i1 = icmp eq i32 %arg17, 0
call void @llvm.assume(i1 %i1)
%i2 = call noundef double @read_only_load()
%i4 = call noundef double @__shfl_sync(double noundef %i2)
ret void
}

Solution

We went through multiple solutions to address the issue:

D124158 : Skip adding noundef attribute to arguments when function has convergent attribute
D125378 : Introduce shuffle attribute to be used for __shfl_sync like cross-lane APIs
D128907 : Disable noundef attribute for languages which allow uninitialized function arguments

After considering the feedback, we are proposing below solution to solve this issue:
D130224 : Introduce maybe_undef attribute for function arguments which accepts undef values.

Now, for __shfl_sync like cross-lane apis, “maybe_undef” attribute can be added to parameter which is known to be left uninitialized in the kernel. Clang can identify the function argument and skip adding noundef attribute to it. Freeze llvm ir instruction will be introduced at clang codegen for this function parameter to stop propagation of undef.

Implementation

  • “maybe_undef” attribute will be enabled in clang. It can be applied to function argument.

  • In CodeGenModule::ConstructAttributeList, while constructing IR attribute list for function or call, when “maybe_undef” attribute is found on function argument, noundef attribute will be skipped.

  • In CodeGenFunction::EmitCall, while translating the arguments necessary to match IR lowering, freeze instruction is introduced when “maybe_undef” attribute is found on the argument.

Thanks,
Chaitanya.

Do you really need to remove the noundef attribute and freeze the argument? It sounds like you only need either of them. I don’t see the need for adding freeze.

One thing that I find confusing in these patches is why the convergent is relied upon. I understand it’s a proxy to being a Cuda function, but it’s not a perfect proxy. Doesn’t clang have a cuda or similar language identifier that can pinpoint exactly the languages where the current semantics of noundef is not appropriate? (don’t know enough of clang, just asking as the proposed patches make the assumption that convergent == cuda).

I haven’t looked at those particular patches in detail yet. My reason for suggesting the use of freeze is here: LLVM GPU Working Group Meeting – Friday, July 15, 2022 - #3 by nhaehnle

Though I agree that we only need one of no noundef or freeze, both seems overkill.

Thanks for feedback @nlopes @nhaehnle. I have updated the review D130224 to only insert freeze instruction and not remove noundef attribute.

I have initially assumed fix needs to be for all cross-lane apis and used convergent attribute to identify such APIs. But from the feedback from the intial patches found out that’s not the case. So finally we decided to come up with maybe_undef attribute to fix the issue with shfl_sync.

My suggestion was that precisely the other way around, so you would restore the previous behavior.
Is there any benefit of adding freeze in this case vs dropping noundef?

Did you look at the link I posted? (here)