question about llvm.powi and reassociation

Hello, all. To get my feet wet and hopefully make a small contribution, I was looking for something small to start with. I settled on one of the suggestions from the CodeGen readme:

Reassociate should turn things like:

int factorial(int X) {
return XXXXXXX*X;
}

into llvm.powi calls, allowing the code generator to
produce balanced multiplication trees.

I started getting familiar with the relevant parts of the code and see two problems with this as things currently stand:

  • llvm.powi, in both the documentation and the code, is a floating-point-only intrinsic.

  • the reassociate pass avoids doing anything to floating point operands regardless of the state of the “enable-unsafe-fp-math” flag.

Both of these are adjustable, but as I’m trying to ramp up I figured it would be good to ask to make sure I’m understanding what I’m seeing correctly. If I am, I feel like perhaps I’ve chosen poorly for a starting project.

  • Kyle

Hello, all. To get my feet wet and hopefully make a small contribution, I was looking for something small to start with. I settled on one of the suggestions from the CodeGen readme:

> Reassociate should turn things like:
>
> int factorial(int X) {
> return X*X*X*X*X*X*X*X;
> }
>
> into llvm.powi calls, allowing the code generator to
> produce balanced multiplication trees.

I started getting familiar with the relevant parts of the code and see two problems with this as things currently stand:

- llvm.powi, in both the documentation and the code, is a floating-point-only intrinsic.

- the reassociate pass avoids doing anything to floating point operands regardless of the state of the "enable-unsafe-fp-math" flag.

Another problem is that the code generator doesn't currently do
tree balancing, with fpowi or otherwise. Doing tree balancing
profitably would require heuristics for weighing the benefit of
the ILP and reduced operation count against the cost of the
additional register pressure. It's doable, but this is quite a
bit more involved than just forming fpowi calls.

Both of these are adjustable, but as I'm trying to ramp up I figured it would be good to ask to make sure I'm understanding what I'm seeing correctly. If I am, I feel like perhaps I've chosen poorly for a starting project.

Because this one is more involved than it seems from the README
entry, I suggest finding a different starting project.

Dan

This one looks interesting from README.txt:

"viterbi speeds up *significantly* if the various "history" related copy
loops
are turned into memcpy calls at the source level. We need a "loops to
memcpy"
pass."

I think the loops it refers to are these:
for (j=0; j<MAX_history; ++j) {
          history_new[i][j+1] = history[2*i][j];
        }

ScalarEvolution in LLVM will tell you an expression for the indices:
{((144 * (%tmp226608 /u 2)) + %history_new),+,144}
{%history_new,+,144}

Thus you'll only need to figure out whether the indices overlap or not,
and in which direction to copy (using memcpy/memmove as appropriate).

Just a suggestion,
--Edwin