Question to Chris

(Please forgive my late response. I've tried some more cases to make this issue)

As Prof.Adve mentioned, I need to explain exactly what my problem is, but I have no good ability that I can explain it in this plain text space.

For this reason, I made a .pdf file and linked it as follows:

https://netfiles.uiuc.edu/lee225/www/MakingLoops.pdf

Would you please explain to me how I can access to this problem in a better way if you can figure out?

Once again, I really appreciate your time and favor, Bill and Prof.Adve.

Truly yours,
Seung J. Lee

Ok, here are a few suggestions and comments:

1) LLVM has the capabilities to do everything that you are trying to re-implement.
2) Have you looked at the C backend? It recreates loops. It may not create "for" loops but you can hack on it to do that.
3) The way you are converting out of SSA is wrong. You will suffer from lost copies. You should look at using demotePHI(). see include/llvm/Transforms/Utils/Local.h
4) LLVM will compute your trip counts for you, use LoopInfo. You need to be in SSA form to do this.

-Tanya

Let me rephrase your question for you. You can then see the right question, and the answer for it.

The question is, why is, why can't I change:

bb8:
sum_1 = i_0_pn + sum_0_pn;
br bool %exitcond, label %bb10, label %bb3

into

bb8:
br bool %exitcond, label %bb10, label %bb3
sum_1 = i_0_pn + sum_0_pn;

?

The answer is, because that is an invalid transform.

A:
B;
C;

has to be transformed into:

B;
A:
C;

(where the motion of B is copy it to the other side of all control flows into A)

When you do this, you'll notice you get:

\$ cat t.c
int foo() {
unsigned indvar_next27, indvar_next;
int sum_1, sum_0_pn, i_0_pn;
//entry:
unsigned indvar26 = 0;
int sum_0_pn_ph = 0;
//brlabel %bb8_outer
//bb8_outer:
for (indvar_next27=0; indvar_next27 != 10; ){
int i_0_0_ph = (int) indvar26;
unsigned indvar= 0;
int sum_0_pn = sum_0_pn_ph;
int i_0_pn = i_0_0_ph;
//brlabel %bb8
//bb8:
sum_1 = i_0_pn + sum_0_pn; /* LOOK HERE */
for (;indvar!=3;){
//%exitcond= setequint%indvar, 3
//brbool %exitcond, label %bb10, label %bb3
//bb3:
indvar_next= indvar+ 1;
indvar= indvar_next;
sum_0_pn = sum_1;
i_0_pn = 2;
sum_1 = i_0_pn + sum_0_pn; /* LOOK HERE */
//brlabel %bb8
}
//bb10:
indvar_next27 = indvar26 + 1;
//%exitcond28 = setequint%indvar_next27, 10
indvar26 = indvar_next27;
sum_0_pn_ph = sum_1;
//brbool %exitcond28, label %bb16, label %bb8_outer
}
//bb16:
return sum_1;
}

int main() {
printf("%d\n", foo());
}
\$ gcc t.c
t.c: In function ‘main’:
t.c:40: warning: incompatible implicit declaration of built-in function ‘printf’
\$ ./a.out
105

The cost of the .pdf file outweighed the benefit for me. I can see the pretty graphs in the straight text code. If you want to help me see them further, just indent.

The short answer is you can't guess at valid transforms. You have to know they are valid, or prove the are valid.

The only way to get:

for (;C;) {
S
}

is to transform:

top:
if (!C) goto end;
S
goto top;
end:

into it. If you have any other form, it is wrong to transform into it. You _must_transform into the form just above first, and then that form goes into the for statement.

At the end of the day, you have a set of valid transforms, each one of which you can talk about and audit separately. If you have any bugs, they are never in the valid transforms, and the invalid ones stand out to people skilled in the art.

So, if you had written down the transform you were using, we could have identified it. You didn't, so, we could not.

I didn't check your other loop for correctness, or any of the other implied transforms that you used for correctness. If you want to ensure it is correct, you can run large amount of code through it and check the results (I'd call that the industrial approach) and hope it is approximately right, or, you have to reason about each and every transform you're using and hopefully even prove each one. The latter approach I'd call the academic approach.

You're splitting and moving phi nodes seems reasonable.

The transform of:

top:
S;
goto top;

into

for (; {
S;
}

is valid. What isn't valid is putting the induction variable into the condition slot and calling it done. You must first transform it into the form given above. To do that, you have to use other transforms, each one of which you should write down and list and prove is valid.

If you want us to audit each transform for you, write them all down, one by one.

Seung,

3) The way you are converting out of SSA is wrong. You will suffer
from lost copies. You should look at using demotePHI(). see include/
llvm/Transforms/Utils/Local.h

Using demotePHI() to remove PHI nodes should be the easiest way.
However, if you want to keep values in registers, you must take into
account the problem mentioned above, and also the "swap problem". You
http://www.cs.ucsb.edu/~ckrintz/papers/spe98-ssaconstruct-deconstruct.pdf.gz
to see when these problems appear and how to deal with them. If you
address these issues, I think, your algorithm to remove PHI nodes should
be okay.

-Wojtek

Seung,

Would you please explain to me how I can access to this problem in a better way if you can figure out?

Here are my thoughts on your first question: "how to make it count
correctly for the reconstruction of loops?". I believe it can't be done
for an arbitrary loop. However, IIRC from previous posts your loops are
simple - for example: they don't contain any breaks, gotos. I think,
your reconstruction could be done for two forms of loops that are very
common:

1) rotated canonical loop (bb8.outer in your example):
The loop has only one backedge, and an induction variable counting from
0 with step 1. The only basic block that branches out of the loop (so
called, exiting block) is the backedge.

%i = phi [ 0, %preheader ], [ %i.next, %body ]
...
br label %body ;%header and %body may be the same block

body:
...
%c = seteq %i.next, %n
br %c, label %exit, label %header

exit:
...

The loop iteration count is %n. In this case, it means that _every_
basic block is executed %n times. Your current version of reconstruction
seems to be okay for loops in this form.

2) unrotated canonical loop (bb8 in your example):
The loop has only one backedge and an induction variable counting from 0
with step 1. The only basic block that branches out of the loop is the

%i = phi [ 0, %preheader ], [ %i.next, %body ]
...
%c = seteq %i, %n
br %c, label %exit, label %body

body:
...

exit:
...

The loop iteration count is also %n. However, this time it means that
the loop header is executed %n times, while the rest of blocks - %n-1
times. The reconstruction should take it into account. For example, it
could make a "for" loop from all loop basic blocks setting the iteration
count to %n-1, and then insert a copy of the loop header with %i==%n
after the "for".

I also suggest taking look at the LoopInfo pass. It has many useful
methods to analyze loops. To determine the loop iteration count you may
use the ScalarEvolution pass as well.

-Wojtek