[RFC] The masking issue caused by loop exit comparison

Problem Description

In common loop structures, if the initial value of the induction variable is not a constant, it can lead to a masking issue. For example

void func(int result[], int start){
  for (int i = start; i < 100 ;i++)
    result[i] += 1;
}

The IR code corresponding to this common loop is as follows. As shown in line 5, it introduces a masking issue.

for.body:                                         
  %indvars.iv = phi i64 [ %0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
  ...
  %indvars.iv.next = add nsw i64 %indvars.iv, 1
  %2 = and i64 %indvars.iv.next, 4294967295
  %exitcond.not = icmp eq i64 %2, 100
  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body

And the ideal IR should look like this:

for.body:                                         
  %indvars.iv = phi i64 [ %0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
  ...
  %indvars.iv.next = add nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, 100
  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body

Based on our observations, the reason behind is that IndVarSimplifyPass generates an unnecessary trunc instruction, which is then converted to a mask instruction by InstCombinePass. We will make detailed explain below.

1. IndVarSimplifyPass generates a Trunc instruction

As shown in the figure 1, when IndVarSimplifyPass selects an induction variable(IV) to serve as the LoopCounter, linearFunctionTestReplace() rewrites the loop exit comparison condition using this IV. The IV’s type is i64 , while the second operand in the icmp (the constant 0) remains of type i32. This type mismatch leads to the generation of a trunc instruction that truncates the IV to i32 type. This trunc is unnecessary because the comparison could be performed using the original i64 type without any loss of information.

However, we have observed that when the initial value of the IV is not a constant, LLVM’s existing design introduces this unnecessary trunc.

  • Taking the initial program as an example, as shown in the figure 2, IndVarSimplifyPass indeed generates a trunc instruction

  • if the initial value of the IV is a constant, as shown in the figure 3, the trunc instruction will not be generated.

2. InstCombinePass optimizes the Trunc into a masking instruction

Still taking the initial program as an example, as shown in the figure 4, InstCombinePass optimizes the Trunc instruction into a masking instruction.

By debugging with GDB, We found that in InstCombinePass::foldICmpTruncConstant() function(as shown below), the code converts trunc i64 %IV to i32 to and i64 IV 0xFFFFFFFF, which is and i64 %IV, 4294967295, resulting in the masking issue we observe.

    // Canonicalize to a mask and wider compare if the wide type is suitable:
    // (trunc X to i8) == C --> (X & 0xff) == (zext C)
    if (!SrcTy->isVectorTy() && shouldChangeType(DstBits, SrcBits)) {
      Constant *Mask =
          ConstantInt::get(SrcTy, APInt::getLowBitsSet(SrcBits, DstBits));
      Value *And = Builder.CreateAnd(X, Mask);
      Constant *WideC = ConstantInt::get(SrcTy, C.zext(SrcBits));
      return new ICmpInst(Pred, And, WideC);
    }

Everyone is welcome to discuss! :laughing:

There are two parts of SimplifyIndVar that may widen the induction counter:

  1. WidenIV should only widen the induction variable to match its widest use. I don’t see a 64 bit use of i.

  2. linearFunctionTestReplace uses ScalarEvolution to rewrite the comparison according to ScalarEvolution’s getBackedgeTakenCount(). I assume the i64 comes from that analysis (Why?). There is also SimplifyIndvar::eliminateTrunc() to handle this kind of situation. Why doesn’t apply here?

Note that there could be another analysis that removes the trunc again, because add nsw i64 %indvars.iv would eventually evaluate to poison if the and i64 %indvars.iv.next, 4294967295 would be something else than a noop. Or because 4294967295 falls out of the range of value that %indvars.iv could be.

Good evening!
In fact, it’s the AllowIVWidening in IndVarSimplifyPass that extends the IV type to i64 in order to eliminate z/sext instructions. When I disabled IV widening by using -indvars-widen-indvars=false , the masking issue disappeared, but I don’t think this is a good way to address the problem.
Below are the option I used for testing, which should be general enough:

clang-g -S -emit-llvm -O3 -static -ffast-math -march=rocketlake -mllvm -vectorize-loops=false -mllvm -vectorize-slp=false -mllvm -enable-loop-distribute test.c -o test.ll

I still need to confirm some other details. :grinning_face:

WidenIV should only widen the induction variable to match its
widest use. I don’t see a 64 bit use of |i|.

Presumably, that comes from the gep access from result[i], which loves
widening the values from 32 bit to 64 bit.

2 Likes

Hi! I tried calling simplifyAndExtend() after linearFunctionTestReplace(), and the simplifyAndExtend() function will calleliminateTrunc(), but it did not successfully remove the trunc instruction, and the masking issue still persists.

Hi! I am currently working on the masking issue and expect to submit a PR next week. :slight_smile:

1 Like