Shrinking induction variable integers

Hi,

I'm trying to solve a problem with loop induction variables being larger than they need to be. It looks like IndVarSimplify or LoopStrengthReduce are supposed to do what I want, but it isn't happening.

If I have a function like this, the local pointers are 32-bits, but size_t is 64. 64 bit integers require an extra register (but is a legal type), and i64 add isn't a legal operation for the target, so it should be avoided. The loop induction variable should be replaced with a cheaper 32-bit integer, since the bound and the pointer are 32-bits. Instead, the bound is extended to i64, the induction variable and bounds check stays i64, and then has to be truncated to the pointer size. LSR does nothing after concluding that there are no "interesting" IV users.

How / where should I go about fixing this? I don't really understand the difference between indvars and LSR.

void matrixSum(local double* partialSums, local double* finalSum, int num)
{
     double sum = 0.0;

     for (size_t i = 0; i < num; ++i) // size_t is i64
     {
         sum += partialSums[i];
     }

     finalSum[0] = sum;
}

define void @matrixSum(double addrspace(3)* nocapture readonly %partialSums, double addrspace(3)* nocapture %finalSum, i32 %num) #0 {
entry:
   %conv = sext i32 %num to i64
   %cmp6 = icmp eq i32 %num, 0
   br i1 %cmp6, label %for.end, label %for.body

for.body: ; preds = %entry, %for.body
   %i.08 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
   %sum.07 = phi double [ %add, %for.body ], [ 0.000000e+00, %entry ]
   %idxprom = trunc i64 %i.08 to i32
   %arrayidx = getelementptr inbounds double addrspace(3)* %partialSums, i32 %idxprom
   %0 = load double addrspace(3)* %arrayidx, align 8, !tbaa !2
   %add = fadd double %sum.07, %0
   %inc = add i64 %i.08, 1
   %cmp = icmp ult i64 %inc, %conv
   br i1 %cmp, label %for.body, label %for.end

for.end: ; preds = %for.body, %entry
   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add, %for.body ]
   store double %sum.0.lcssa, double addrspace(3)* %finalSum, align 8, !tbaa !2
   ret void
}

Hi,

I'm trying to solve a problem with loop induction variables being larger than they need to be. It looks like IndVarSimplify or LoopStrengthReduce are supposed to do what I want, but it isn't happening.

If I have a function like this, the local pointers are 32-bits, but size_t is 64. 64 bit integers require an extra register (but is a legal type), and i64 add isn't a legal operation for the target, so it should be avoided. The loop induction variable should be replaced with a cheaper 32-bit integer, since the bound and the pointer are 32-bits. Instead, the bound is extended to i64, the induction variable and bounds check stays i64, and then has to be truncated to the pointer size. LSR does nothing after concluding that there are no "interesting" IV users.

How / where should I go about fixing this? I don't really understand the difference between indvars and LSR.

Interesting problem. I don’t have a solution but can make a few observations.

The IR coming out of your frontend has already promoted the cmp ult %inc, %num to i64. Demoting it requires reasoning that %inc does not overflow within the loop. The indvars pass could in theory do this. It already uses SCEV to determine that sext(trunc(%inc)) == %inc. But demoting compares isn’t something indvars currently does. Instead, indvars tries to promote induction variables to hoist sext/zext outside the loop, promoting compares in the process. It does this as long as i64 is a legal type. So even if your frontend were to generate the i32 IV, you may need to teach indvars *not* to promote the IV by checking the cost model in addition to isLegalInteger.

It looks to me like the trunc is an interesting user of %i.08. That means that the SCEV expression is a recurrence (evolves from some loop invariant base, adding a stride at each iteration). So I don’t know why LSR claims no interesting users. However, I’m not surprised LSR doesn’t do anything. It mainly tries to reduce the number of registers live in the loop. So not much it can do here.

A new target-sensitive IV optimization based on SCEV could go in either indvars or LSR. There are already random loop exit optimizations in both passes that don’t really fit with the main pass. It’s mainly a question of whether you want to do it before or after vectorization.

-Andy