65bit integer math

I have a test case(attached as fc_long.ll) that when run through the optimizer produces 65bit integer math(fc_long-opt.ll).

Now I understand that llvm can have any length integer, but I consider turning a 64bit mul into multiple 65 bit instructions to be a ‘bad’ optimization. This eventually expands to a 128bit multiply call(__multi3) which I have absolutely no interest in supporting. So I’m wondering what optimization might be the culprit here so I can disable it in this situation.

Micah

fc_long.ll (3.22 KB)

fc_long-opt.ll (2.22 KB)

I'm pretty sure this is the fault of one of the loop passes (I forget
which one off the top of my head). It's worth noting that the
optimizer manages to eliminate the loop; computing the exit value of
one of the loop variables uses the wide multiply.

-Eli

It's -indvars. I ran mem2reg, instcombine, and simplifycfg on your
input and got the attached file. I thought I could add "nuw nsw" to
the arithmetic inside the loop to say that i65 is unnecessary, but
that doesn't cause indvars to leave the arithmetic at i64. Is that a
bug in indvars, or in SCEV, or is this just a natural limitation of
SCEV?

fc_long.ll (1.56 KB)

It's a limitation of SCEV; the relevant function is
SCEVAddRecExpr::evaluateAtIteration in ScalarEvolution.cpp.

-Eli

I haven't looked at your attached .ll, but I have added code to the loop analyser to make it do this. Given the choice between not analysing loop evolutions and growing arithmetic by an extra bit, it will expand the arithmetic. It's in SCEV, which is used by pretty much every loop optimizer.

Nick

Villmow, Micah wrote:

It's SCEV. I have this very bit of code disabled in our tree for exactly the
reasons Micah gives. Check out BinomialCoefficient is ScalarEvolution.cpp.

Here's the big hack we put in, which is just the code that used to be there
in 2.3:

  // We need at least W + T bits for the multiplication step
  // FIXME: A temporary hack; we round up the bitwidths
  // to the nearest power of 2 to be nice to the code generator.
  unsigned CalculationBits = 1U << Log2_32_Ceil(W + T);

  // Cray: Truncate to 64 bits. It's what we did with LLVM 2.3 and
  // while it can create miscompilations in some cases if the
  // computation happens to overflow, it's no worse than what we did
  // before.
  if (CalculationBits > 64) {
    //return new SCEVCouldNotCompute();
    CalculationBits = 64;
    DEBUG(DOUT << "*** [dag] SEVERE WARNING: SCEV calculation truncated to 64
                   bits, check for miscompilation! ***\n");
  }

I have never found a test where this actually causes a problem. And we have
tens of thousands of tests.

It is very bad form to create wide arithmetic where none was there before.
This is one of the downfalls of SCEV but I consider it to be minor compared
to the benefits.

                                     -Dave

David,
Thanks for helping me figure out exactly where to put the check. I was
trying to do modifications in other parts of that function and it wasn't
working very well. I've attached a patch that parameterizes this and
allows it to be set from the commandline instead of hardcoding it.

Would it be possible to get this patch pushed into the tree?

Thanks,
Micah

ScalarEvolution.diff (1.09 KB)

LLVM's ScalarEvolution's handling of non-linear recurrences has much room
for improvement. The problem Micah describes sounds like the one
in lib/Analysis/README.txt.

Dan

What does nsw nuw do?

Micah

"no [un]signed wrapping": http://llvm.org/docs/LangRef.html#i_add

Villmow, Micah wrote:

What does nsw nuw do?

It means it doesn't overflow whether considered signed or unsigned. For
example, in i8 arithmetic, 255+1 overflows in the unsigned sense, while
127+1 overflows in the signed sense. nsw nuw means neither of these
happens.

Ciao,

Duncan.