[loops in LLVM][CFG] Various methods to know weather 2 loops are identical in a given CFG.

I am trying to identify from a given CFG for a function, if 2 loops represented in LLVM’s loop representation form are identical.

The simplest definition of identical loops could be loops with

  1. same initialization
  2. same trip count[1]
  3. same increment value

Comparing 2 loops based on the instructions inside the loop seems problematic for cases when some loop optimizations would try to remove/move instructions out of the loop.

I was interested to know if there are other methods or already available analysis to find if the 2 loops are identical apart from the above mentioned methods?

[1] https://github.com/llvm/llvm-project/blob/e42636d3c1a41a9b7c5d8095ae5ef6682e26d4a2/llvm/lib/Transforms/Scalar/LoopFuse.cpp#L704

Unsure what "identical" means if you are not looking at the instructions.

Let me ask like this, what are equivalent loops to this one:

for (int i = 0; i < 10; ++i) { A[i] = i; }

A) for (int i = 0; i < 10; ++i) { A[i] = i + 1; }

B) for (int i = 0; i < 10; ++i) { A[i+1] = i; }

C) for (int i = 1; i <= 10; ++i) { A[i-1] = i-1; }

D) for (int i = 1; i < 11; ++i) { A[i-1] = i-1; }

E) for (int i = 9; i >= 0; --i) { A[9 - i] = 9 - i; }

While the answer does impact what I would recommend to do, I think
loop fusion code is a good start.

~ Johannes

Thank you for the clarification with an example.

Here’s what I meant by identical, there are 2 forms I could think of
Strongly identical -

  1. Exact match of loop headers and loop bodys between 2 loops.
    e.g Loop mentioned below
    for (int i = 0; i < 10; ++i) { A[i] = i; }
    is not identical to any options mentioned, Whereas a matching candidate which is strongly identical could be an exact loop with flexibility to change the iteration variable name.
    for (int j = 0; j < 10; ++j) { A[j] = j; }
    for (int i = 0; i < 10; ++i) { A[i] = i; }

Weakly identical -
Loops are matched based on only header and the body is skipped for check.
e.g A,B are identical from the above mentioned question.

Given that it becomes difficult to have strongly identical loops, reverting back to weakly identical loops might help to identify if the compiler does make changes to the other loops whereas it misses doing so in the first loop.
Sample loop
for (int i = 0; i < 10; ++i) {
X = 100;
A[i] = i;
for (int i = 0; i < 10; ++i) {
A[i] = i;
X = 100;
From the definition of strongly identical they are not equal (as their headers match, whereas bodies don’t).
But having a strongly identical definition makes it harder to identify if in some cases the instructions in the loop’s body are moved and optimized.
e.g. When X is pulled out, it becomes weakly identical to sample loop.
X = 100;
for (int i = 0; i < 10; ++i) {
A[i] = i;

This loop though not identical as per strongly identical definition but it is identical when compared using weakly identical definition( as the header are same), this usage of weakly identical helps to figure out if some loops are getting optimized whereas others aren’t.
On top of that, if 2 loops seem weakly identical, we can do more metric checks like count the number of instructions in them to know if there is some missing/different approach followed for the other loop.