Problem Description
In common loop structures, if the initial value of the induction variable is not a constant, it can lead to a masking issue. For example
void func(int result[], int start){
for (int i = start; i < 100 ;i++)
result[i] += 1;
}
The IR code corresponding to this common loop is as follows. As shown in line 5, it introduces a masking issue.
for.body:
%indvars.iv = phi i64 [ %0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
...
%indvars.iv.next = add nsw i64 %indvars.iv, 1
%2 = and i64 %indvars.iv.next, 4294967295
%exitcond.not = icmp eq i64 %2, 100
br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
And the ideal IR should look like this:
for.body:
%indvars.iv = phi i64 [ %0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
...
%indvars.iv.next = add nsw i64 %indvars.iv, 1
%exitcond.not = icmp eq i64 %indvars.iv.next, 100
br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
Based on our observations, the reason behind is that IndVarSimplifyPass generates an unnecessary trunc instruction, which is then converted to a mask instruction by InstCombinePass. We will make detailed explain below.
1. IndVarSimplifyPass generates a Trunc instruction
As shown in the figure 1, when IndVarSimplifyPass
selects an induction variable(IV) to serve as the LoopCounter, linearFunctionTestReplace() rewrites the loop exit comparison condition using this IV. The IV’s type is i64 , while the second operand in the icmp
(the constant 0
) remains of type i32. This type mismatch leads to the generation of a trunc
instruction that truncates the IV to i32 type. This trunc is unnecessary because the comparison could be performed using the original i64 type without any loss of information.
However, we have observed that when the initial value of the IV is not a constant, LLVM’s existing design introduces this unnecessary trunc.
-
Taking the initial program as an example, as shown in the figure 2, IndVarSimplifyPass indeed generates a trunc instruction
-
if the initial value of the IV is a constant, as shown in the figure 3, the trunc instruction will not be generated.
2. InstCombinePass optimizes the Trunc into a masking instruction
Still taking the initial program as an example, as shown in the figure 4, InstCombinePass optimizes the Trunc instruction into a masking instruction.
By debugging with GDB, We found that in InstCombinePass::foldICmpTruncConstant() function(as shown below), the code converts trunc i64 %IV to i32
to and i64 IV 0xFFFFFFFF
, which is and i64 %IV, 4294967295
, resulting in the masking issue we observe.
// Canonicalize to a mask and wider compare if the wide type is suitable:
// (trunc X to i8) == C --> (X & 0xff) == (zext C)
if (!SrcTy->isVectorTy() && shouldChangeType(DstBits, SrcBits)) {
Constant *Mask =
ConstantInt::get(SrcTy, APInt::getLowBitsSet(SrcBits, DstBits));
Value *And = Builder.CreateAnd(X, Mask);
Constant *WideC = ConstantInt::get(SrcTy, C.zext(SrcBits));
return new ICmpInst(Pred, And, WideC);
}
Everyone is welcome to discuss!