LLVM's loop strength reduction module

Hi,

My name is Venugopal Raghavan and I work in AMD. I was trying to understand the code in the file LoopStrengthReduce.cpp but I am making very slow progress.

Is there any additional documentation available that would help me understand the code, like a PPT presentation or a design document or maybe a paper? I did not find anything on the Internet.

There are comments interspersed in the code which are helpful but don’t seem sufficient for me to get a good understanding of the code.

Thanks.

Regards,

Venu.

AFAIK, no official doc.
You can probably get better help if you ask specific questions (which part of the code you don’t understand).

Hi Raghavan,

I concur no specific docs.
What do you want to know specifically?

Cheers,
-Quentin

Hi,

Sorry I took a long time to reply as it took me some time to get some understanding of the code even to ask some specific questions (I have a test case in which LSR does not kick in and wanted to understand the code to figure out why it was not kicking in).

Here are some specific questions I have:

  1. It appears that LSR works only for the inner-most loop. Is this correct? Can you tell me why this is so? I believe SCEV works for nested loops, right?

  2. Can LSR work for those LSR uses in a nested loop whose associated formulae themselves do not span multiple loops, ie. the formulae do not have references to loops other that the one that is being processed currently? Of course, we first need to get rid of the checks which currently restrict LSR only to the innermost loop

  3. Why do we compute “chains” upfront in the CollectChains() function and then generate all the LSR uses with the associated formulae, prune the formulae, compute the solution and then finally refer to the chains computed in the first step to implement the solution? Can the chains somehow drive the LSR use and formulae generation process to restrict the latter to only those that are “interesting” for the chains computed?

  4. I may have mis-understood the code, but it seems that the function SolveRecurse() succeeds in computing a solution only if every LSR use has at least one formula associated with it. For example, if all the formulae associated with an LSR use get filtered away, it appears that the SolveRecurse() function would not compute a solution and instead say “No Satisfactory Solution”. In my example, an LSR use has all the formulae associated with it filtered away as they are all “loser” formulae and I do not get a solution from the SolveRecurse() function. Is it correct for the SolveRecurse() function to ignore LSR uses without any associated formulae and come up with a solution that involves the other LSR uses? Maybe, this will create an inefficient solution, but would it create an incorrect solution? Right now, SolveRecurse() seems to take an all-or-nothing approach. My apologies if I have completely mis-understood the concepts of LSR use and formulae but I am just trying to understand all this from the code and that is proving very difficult to do

  5. Given the current implementation in SolveRecurse() function, why not check if there are LSR uses with no associated formulae and not even execute the SolveRecurse() function? This would avoid some compile time overhead, wouldn’t? SolveRecurse() does not need to be called if an LSR use has an empty formulae set associated with it because even if you called it you would not get a solution from it, right?

Thanks.

Regards,

Venugopal Raghavan

Hi,

In connection with question (4) in my previous email, I filtered out LSRUses with no formulae associated with them at the beginning of the Solve() function and then ran this change on my multiply nested loop example. LSR then computes a solution whereas earlier it did not. The test case passes, so the code generated must be correct.

However, the code seems to be inefficient since my example runs slower now. I am not sure what exactly the problems are but I have “good” code for this example which was manually created with LSR incorporated in it and the LLVM generated code with my fix above seems not as good as this. I need to study the differences between these two versions further.

Regards,

Venu.

Hi,

Hi,

Sorry I took a long time to reply as it took me some time to get some understanding of the code even to ask some specific questions (I have a test case in which LSR does not kick in and wanted to understand the code to figure out why it was not kicking in).

Here are some specific questions I have:

1) It appears that LSR works only for the inner-most loop. Is this correct?

Yes, that is correct.

Can you tell me why this is so?

The rationale was that supporting outer-loops complicates the whole implementation whereas most of the performance gain are within the inner-most loop. Therefore we decided it was not worth the complexity.

I believe SCEV works for nested loops, right?

I believe that is correct.

2) Can LSR work for those LSR uses in a nested loop whose associated formulae themselves do not span multiple loops, ie. the formulae do not have references to loops other that the one that is being processed currently?

In theory, yes.

Of course, we first need to get rid of the checks which currently restrict LSR only to the innermost loop
3) Why do we compute “chains” upfront in the CollectChains() function and then generate all the LSR uses with the associated formulae, prune the formulae, compute the solution and then finally refer to the chains computed in the first step to implement the solution?

That’s a very good question!
I believe this is a historical accident before the time we started to prune the search space. We could probably be smarter here.

Can the chains somehow drive the LSR use and formulae generation process to restrict the latter to only those that are “interesting” for the chains computed?

4) I may have mis-understood the code, but it seems that the function SolveRecurse() succeeds in computing a solution only if every LSR use has at least one formula associated with it. For example, if all the formulae associated with an LSR use get filtered away, it appears that the SolveRecurse() function would not compute a solution and instead say “No Satisfactory Solution”. In my example, an LSR use has all the formulae associated with it filtered away as they are all “loser” formulae and I do not get a solution from the SolveRecurse() function. Is it correct for the SolveRecurse() function to ignore LSR uses without any associated formulae and come up with a solution that involves the other LSR uses?

That is correct.

Maybe, this will create an inefficient solution, but would it create an incorrect solution?

I believe the rewriter logic is not ready for that, so that would be incorrect, but if we solve that would just be inefficient.

Right now, SolveRecurse() seems to take an all-or-nothing approach. My apologies if I have completely mis-understood the concepts of LSR use and formulae but I am just trying to understand all this from the code and that is proving very difficult to do

5) Given the current implementation in SolveRecurse() function, why not check if there are LSR uses with no associated formulae and not even execute the SolveRecurse() function? This would avoid some compile time overhead, wouldn’t? SolveRecurse() does not need to be called if an LSR use has an empty formulae set associated with it because even if you called it you would not get a solution from it, right?

That’s correct too.

Though I would have expected we reject those cases earlier.

Cheers,
-Quentin

Hi,

Thanks a lot for the replies.

As I said in my previous email, I got LSR to work on my multiply nested loop example (which I think has optimization opportunities even in outer loops) by ignoring LSRUses with no associated formulae. The code generated is functionally correct, but runs slower.

I think there are rewriter issues that need to be fixed to get rid of the performance problems. We have manually created code with LSR working for outer loops which gives good performance. I guess I can compare the two pieces of code to identify the performance issues in LLVM generated code.

Regards,

Venu.