Aggressive conversion of sext to zext blocks IndVarSimplify

Consider the following function

void foo(unsigned char *a,                                                       
         unsigned *b,                                                            
         int n) {                                                                
  int j = a[0] << 8;
  int i = n-1;
  for (; i >= 3; i -= 4) {
    j = (j >> 8) | (a[i] << 8);
    b[j]++;
    j = (j >> 8) | (a[i-1] << 8);
    b[j]++;
    j = (j >> 8) | (a[i-2] << 8);
    b[j]++;
    j = (j >> 8) | (a[i-3] << 8);
    b[j]++;
  }
}

The induction variable i is a signed 32-bit integer that is only used to index an array. On a 64-bit target, we would like IndVarSimplify to promote the induction variable to an i64 to remove the extends needed before the array indexing. This should be possible by relying on signed integer overflow being undefined behavior.

For this to work, IndVarSimplify needs to see nsw flags on the arithmetic using i. And the extends need to be sext.

Unfortunately, before IndVarSimplify runs, IPSCCP and CorrelatedValuePropagation both do analysis of the range of the induction variable and determine that the sexts can be replaced with zexts.

I thought it might be possible to use the same range analysis being done by IPSCCP to CorrelatedValuePropagation to also add nuw markers to the sub instructions that decrease i. That might let IndVarSimplify work by using zext and nuw

But that doesn’t work because InstCombine will convert the sub instructions into add instructions with negated immediates. That conversion requires the nuw flag to be dropped.

Anyone have any ideas for a path forward here?

I’m not sure why you believe that to be related to the zext vs sext.
This appears to be a phase ordering problem, rerunning -indvars promotes i: Compiler Explorer
… and it is CVP that enables the IndVars to perform the promotion here: Compiler Explorer

Hi Craig,

Really interesting case. This is yet-another case of LLVM optimizations proving things and canonicalizing them without capturing that information (similar to canonicalizing add to or). What we need here is something like “anyext” (in codegen) that says “extend this value, it doesn’t matter what you use because the sign bit is known to be zero”. If we had that, we could directly turn the anyext IR instruction into the ANYEXT sorts of nodes as isel time.

In terms of LLVM IR, we could do this in one of two ways:

  1. Introduce an anyext IR instruction as a complement to zext/sext
  2. Merge zext/sext together into one “ext” instruction and have an enum that indicates the extension policy (zero, sign, any).

-Chris

Thanks Chris and Roman.

Looks like I missed that IndVarSimplify is between IPSCCP and CorrelatedValuePropagation. So maybe it would have worked if IPSCCP had done a more complete job like CVP eventually does. If running IndVarSimplify again works, I wonder why LTO didn’t pick it up. I’ll go investigate this some more and report back.

Rerunning IndVarSimplify is only partially working. It’s promoting the induction variable, but it truncated it, then did some i32 arithmetic, and then zero extended it. I don’t want those extra truncates and zero extends.

For example from Compiler Explorer

%sub8 = add nsw i32 %indvars1, -1
%4 = zext i32 %sub8 to i64

Yeah, widening these things (guided by the target data set of native integer widths) seems like a good thing to do. That could go in instcombine for example.

We’ve got (zext (add (trunc indvar, C))). It can’t be widened without an nuw flag on the add. This is why the conversion to zext is a problem. We have nsw but not nuw.

But you could widen it with an ‘anyext’ right? The semantics of ‘anyext’ are “I know the top bit is zero” which is what you need to know here.

We should just have no-wrap flags for trunc and ext, i think:

  • trunc nsw == at most, only sign bits are lost during truncation, no non-sign bits are lost.
  • trunc nuw == no set bits are lost
  • zext nsw == arithmetic value does not change, we are actually padding with the sign bit, "this zext is an sext".
  • zext nuw (implicit) == numeric value does not change, not really useful.
  • sext nsw (implicit) == arithmetic value does not change, not really useful.
  • sext nuw == numeric value does not change, we are actually padding with the zero bit, "this sext is an zext".