Eliminating gotos

We would like to develop a code generator using LLVM for a target language that does not support conditional branches and in fact only supports structured control flow, eg. If and while. As far as I can tell that the problem with doing this in LLVM today, is that it does not support these high-level constructs and instead all control flow is implemented as branches.

It is “fairly” straightforward to restructure a program written with conditional/unconditional branches into to one that uses completely high-level control flow structures, the algorithm I have in mind is described in [1], the problem is how to best represent the resulting IL within the LLVM framework:

  1. Extend LLVM with news ops to support if/loop.
  2. Implement this with the insertion of intrinsics to represent high-level control-flow, introducing “false” dependencies if necessary to allow optimizations to be applied without changing the semantics.
  3. Implement some structure of to the side that represents this high-level flow.

Thoughts?

Ben

[1] “Taming Control Flow: A structured approach to eliminating goto
statements”, A.M. Erosa and L.J. Hedren, ICCL 1994

We would like to develop a code generator using LLVM for a target language that does not support conditional branches and in fact only supports structured control flow, eg. If and while.

What’s the difference between an “if” and a conditional branch?

As far as I can tell that the problem with doing this in LLVM today, is that it does not support these high-level constructs and instead all control flow is implemented as branches.

It is “fairly” straightforward to restructure a program written with conditional/unconditional branches into to one that uses completely high-level control flow structures, the algorithm I have in mind is described in [1], the problem is how to best represent the resulting IL within the LLVM framework:

  1. Extend LLVM with news ops to support if/loop.
  2. Implement this with the insertion of intrinsics to represent high-level control-flow, introducing “false” dependencies if necessary to allow optimizations to be applied without changing the semantics.
  3. Implement some structure of to the side that represents this high-level flow.

Thoughts?

One downside of this is that it means eliminating all instances of irreducible control flow, which can lead to an exponential increase in code size.

–Owen

The way I'd do this is to establish a restricted subset of the LLVM IL
with control flow that the converter can easily deal with, then write
a transformation pass to ensure that the necessary invariants hold.
This approach allows separating the transformation step and the actual
conversion code, which should make developing and debugging them
easier.

Introducing new control-flow constructs into the LLVM IL would require
such massive changes that I don't think it's worth considering.

For the transformation step, requiring reg2mem will probably make
writing the pass a bit easier; if you do that, the pass doesn't have
to deal with PHI nodes. Also, the functions in
llvm/Transforms/Utils/Cloning.h are likely to be useful.

Being able to run general CFG-modifying LLVM optimization passes on
the transformed code without breaking the invariants would be a pain,
so I don't think it's worth bothering with it. If it turns out that
you really need some CFG-modifying pass, you can revisit it later.
Non-CFG-modifying passes, like instcombine and GVN, are safe because
they can't break CFG-based invariants.

The alternative would be to delay eliminating gotos until after the
conversion... this might be easier to implement if you're basing your
implementation on the paper you cited.

And another alternative, depending on your language, would be to split
up irreducible chunks into separate functions, essentially turning
gotos into tail calls.

-Eli

Hi,

Comments inline.

Ben

We would like to develop a code generator using LLVM for a target language that does not support conditional branches and in fact only supports structured control flow, eg. If and while.

What’s the difference between an “if” and a conditional branch?

[bg] Consider the LLVM code:

define i32 @foo(i32 %x, i32 %y) {
entry:
%tobool = icmp eq i32 %x, 0 ; [#uses=1]
br i1 %tobool, label %ifelse, label %ifthen

ifthen: ; preds = %entry
%add = add i32 %x, 10 ; [#uses=1]
ret i32 %add

ifelse: ; preds = %entry
%call = tail call i32 (…)* @bar( ) ; [#uses=0]
ret i32 %y
}

We cannot express this as it uses the conditional branch br but applying goto elimination it is possible to re-write as:

define i32 @foo(i32 %x, i32 %y) {
entry:
%tobool = icmp eq i32 %x, 0

if i1 %tobool
%nottobool = icmp eq i1 %tobool, false
if i1 %notobool
endif
ifthen:
%add = add i32 %x, 10
ret i32 %add
endif

ifelse:
%call = tail call i32 (…)* @bar( )
ret i32 %y
}

Of course, this can be optimized to be:

define i32 @foo(i32 %x, i32 %y) {
entry:
%tobool = icmp eq i32 %x, 0

if i1 %tobool
ifthen:
%add = add i32 %x, 10
ret i32 %add
endif

ifelse:
%call = tail call i32 (…)* @bar( )
ret i32 %y
}

I agree that in some sense this is an argument over syntax but the issue for us is that our ISA represents control flow using structured ops and so we have no option but to re-write the code into this form or change the hardware :-)!

As far as I can tell that the problem with doing this in LLVM today, is that it does not support these high-level constructs and instead all control flow is implemented as branches.

It is “fairly” straightforward to restructure a program written with conditional/unconditional branches into to one that uses completely high-level control flow structures, the algorithm I have in mind is described in [1], the problem is how to best represent the resulting IL within the LLVM framework:

  1. Extend LLVM with news ops to support if/loop.
  2. Implement this with the insertion of intrinsics to represent high-level control-flow, introducing “false” dependencies if necessary to allow optimizations to be applied without changing the semantics.
  3. Implement some structure of to the side that represents this high-level flow.

Thoughts?

One downside of this is that it means eliminating all instances of irreducible control flow, which can lead to an exponential increase in code size.

[bg] Actually this does not need to be the case. The paper that I sighted does not use code replication to resolve irreducible control flow but instead introduces a loop construct. For example, consider the following irreducible loop (taken directly from the paper):

if(x) goto L2;
L1:
stmt_1;
L2:
stmt_2;
if(y) goto L1;

Using the algorithm described for goto elimination this can be re-written as:

goto_L2 = x;
do {
if (!goto_L2) {
goto_L1 = 0;
stmt_1;
}
goto_L2=0;
stmt_2;
goto_L1=y;
} while (goto_L1);

In this case there is additional code for condition variables and the loop itself but there is no duplication of stmt_1 or stmt_2.

[bg] Actually this does not need to be the case. The paper that I sighted
does not use code replication to resolve irreducible control flow but
instead introduces a loop construct.

We implemented this in GCC back when we first started GIMPLE (since
GIMPLE is based on the IL the authors of that paper used in their
compiler), and the code size increases on a bunch of testcases were
massive due to extra loop constructs.

Hi,

That is interesting.

Do you have any pointers to the test cases in question?

The problem is that we don't have conditional branches and so we are going
to have to do some form of goto elimination and as such do you have any
alternatives in mind?

Thanks,

Ben

[bg] Consider the LLVM code:

define i32 @foo(i32 %x, i32 %y) {
entry:
%tobool = icmp eq i32 %x, 0 ; [#uses=1]
br i1 %tobool, label %ifelse, label %ifthen

ifthen: ; preds = %entry
%add = add i32 %x, 10 ; [#uses=1]
ret i32 %add

ifelse: ; preds = %entry
%call = tail call i32 (…)* @bar( ) ; [#uses=0]
ret i32 %y
}

SNIP

define i32 @foo(i32 %x, i32 %y) {
entry:
%tobool = icmp eq i32 %x, 0

if i1 %tobool
ifthen:
%add = add i32 %x, 10
ret i32 %add
endif

ifelse:
%call = tail call i32 (…)* @bar( )
ret i32 %y
}

I’m still not seeing how these two are any different. You just replace the text of “if” with “br”, and add the explicit target labels. I should also point out that, in LLVM IR, the order the blocks are laid out in is not meaningful and could change, so representing them explicitly in the branch or “if” is a requirement. Also, this ordering (and, indeed, the number and structure of basic blocks) is not guaranteed to be preserved into the Machine-level phase of code generation.

What I’m guessing you’re getting at is that you need is to insert an end-if instruction at some point. If this is the case, I don’t think radically changing the LLVM IR is the path of least resistance. What about having a Machine-level pass that enforces the ordering of blocks that you require and inserts the special instructions based on a high-level control flow reconstruction? At the Machine-level, blocks are allowed to have multiple exits, so you could even implement the non-optimized case you gave first. Also, loop-structure analysis is available at the Machine level as well, which might help.

Seems like a lot simpler than massively altering the IR.

I agree that in some sense this is an argument over syntax but the issue for us is that our ISA represents control flow using structured ops and so we have no option but to re-write the code into this form or change the hardware :-)!

I’m still not understanding your point here. Even if LLVM spells it as “br”, your target isel can match it to something spelled “if” with no problem. See my suggestions above about inserting other special instructions and enforcing block ordering above.

  1. Extend LLVM with news ops to support if/loop.
  2. Implement this with the insertion of intrinsics to represent high-level control-flow, introducing “false” dependencies if necessary to allow optimizations to be applied without changing the semantics.
  3. Implement some structure of to the side that represents this high-level flow.

Thoughts?

I missed pointing this one out earlier, but you won’t be able to do this with intrinsics, as block labels cannot be used as first class values, and so cannot be passed to a function call. You’d actually have to add instructions to the IR, or come up with a string-based tagging scheme or something

–Owen

Hi Owen,

SNIP

I’m still not seeing how these two are any different. You just replace the text of “if” with “br”, and add the explicit target labels. I should also point out that, in LLVM IR, the order the blocks are laid out in is not meaningful and could change, so representing them explicitly in the branch or “if” is a requirement. Also, this ordering (and, indeed, the number and structure of basic blocks) is not guaranteed to be preserved into the Machine-level phase of code generation.

What I’m guessing you’re getting at is that you need is to insert an end-if instruction at some point. If this is the case, I don’t think radically changing the LLVM IR is the path of least resistance. What about having a Machine-level pass that enforces the ordering of blocks that you require and inserts the special instructions based on a high-level control flow reconstruction? At the Machine-level, blocks are allowed to have multiple exits, so you could even implement the non-optimized case you gave first. Also, loop-structure analysis is available at the Machine level as well, which might help.

[bg] Ok so I think I’m starting to get it. You are correct in your assertion that we need to insert the end-if instruction at some point and of course else in the case of if-then-else constructs. But we also need to reconstruct while-loops and it is unclear to me if you approach works for all cases of gotos. The other concern here is that as we are targeting an instruction set with virtual registers and register allocation and scheduling will be performed by our assembler not within LLVM and so we are planning on implementing a language style backend, similar in style to the MSIL backend, and as such it is possible to use a machine-level pass?

Seems like a lot simpler than massively altering the IR.

I agree that in some sense this is an argument over syntax but the issue for us is that our ISA represents control flow using structured ops and so we have no option but to re-write the code into this form or change the hardware :-)!

I’m still not understanding your point here. Even if LLVM spells it as “br”, your target isel can match it to something spelled “if” with no problem. See my suggestions above about inserting other special instructions and enforcing block ordering above.

  1. Extend LLVM with news ops to support if/loop.
  2. Implement this with the insertion of intrinsics to represent high-level control-flow, introducing “false” dependencies if necessary to allow optimizations to be applied without changing the semantics.
  3. Implement some structure of to the side that represents this high-level flow.

Thoughts?

I missed pointing this one out earlier, but you won’t be able to do this with intrinsics, as block labels cannot be used as first class values, and so cannot be passed to a function call. You’d actually have to add instructions to the IR, or come up with a string-based tagging scheme or something

Actually I was thinking something along the following lines:

OpenCL → LLVM IR
2 SSA (mem2reg)
LLVM optimizations
2 !SSA (phi2copy)
Goto elimination with intrinsics

Code gen (without register allocation and so on)

Hi Ben,

[bg] Actually this does not need to be the case. The paper that I sighted
does not use code replication to resolve irreducible control flow but
instead introduces a loop construct.

Right, and that technique introduces scalar control variables that replace
control flow with data flow.

We implemented this in GCC back when we first started GIMPLE (since
GIMPLE is based on the IL the authors of that paper used in their
compiler), and the code size increases on a bunch of testcases were
massive due to extra loop constructs.

Do you have any pointers to the test cases in question?

See the thread starting at:
http://gcc.gnu.org/ml/gcc-patches/2002-05/msg00109.html

You can get a copy of the ast-optimizer branch as of May 2002 from the
svn of GCC and apply the patches from that email. That prototype still
can work for you :wink:

Sebastian Pop

Hi Ben,

Hi Mon Ping,

[bg] Ok so I think I’m starting to get it. You are correct in your assertion that we need to insert the end-if instruction at some point and of course else in the case of if-then-else constructs. But we also need to reconstruct while-loops and it is unclear to me if you approach works for all cases of gotos. The other concern here is that as we are targeting an instruction set with virtual registers and register allocation and scheduling will be performed by our assembler not within LLVM and so we are planning on implementing a language style backend, similar in style to the MSIL backend, and as such it is possible to use a machine-level pass?

[ Deleted Text]

I don’t see why not as you have only a different target. Assuming the incoming graph doesn’t have improper intervals, I would think that Owen’s approach to have a structural fixup machine level pass to run over the CFG seems to be the right way to go. I assume that the requirement is to end up with structured control flow and its not required (though it might be desirable) that the incoming source graph is preserved. If the incoming code have improper intervals, I think we could reconstruct it but as other people indicated, the CFG could be quite a bit larger (see [1]).

[bg] I’ve been thinking about this some more and was thinking of implementing some along the following lines:

  • translate into machine-level SSA representation;
  • do phi removal (as we have virtual register set conventional register allocation is not important for us; this happens much later down the tool flow); I was initially planning to do phi removal as described in [1] but it has been pointed out to me that this is not correct in all cases and so now plan to implement the algorithm given in [2].
  • do high-level control flow reconstruction
  • insert if-then-else-endif/while-endwhile blocks
  • finish code generation

Does this make sense?

I agree that it in general conversion of irreducible graphs could be expensive but in our case gotos are not allowed in the original source, break and continue are supported in the target language, and so this implies that we should, in theory, be reconstructing the high-level source that was originally compiled.

[1] Efficiently computing static single assignment form and control dependence graph, Cytron, et. al., 1991.
[2] Practical Improvements to the Construction and Destruction of Static Single Assignment Form, Briggs, et. al., 1997

Regards,

Ben

Hi Mon Ping,

Discussing this with others in AMD it came up if it is possible for LLVM to take a program that has a reducible graph (any C code without goto/setjmp) and generate one that is irreducible? If it is the case that the code is actually structured coming in, a simple pattern matcher could turn everything into if/endif and so on.

Ben

Oh, absolutely. Simply implementing short-circuited evaluation efficiently
can do that. Not to mention optimization passes massively restructuring
the code in unique and unintelligible ways.

My favorite optimization moment came when someone here was searching
for a user message to output after he implemented some fancy tricks. His
first suggestion was, "The loop at line X was vectorized beyond your
comprehension." :slight_smile:

                                                                   -Dave

Yes, at least some passes can; the most obvious example is jump
threading, but there are quite a few other passes that could blast
apart the structure in non-obvious ways. You do have control over
which passes you run, so it's possible to avoid passes that modify the
CFG; however, that excludes a lot of useful passes, like SimplifyCFG
and a lot of the loop passes.

I would suggest an approach that uses something like the pattern
matcher like you're suggesting plus a transformation that enforces the
necessary structure restrictions; that should allow you to test
conversion both ways immediately and give you the most flexibility in
how to use the various LLVM transformation passes. The only downside
is the difficultly of writing the structuring pass; I have a feeling
it'll be tricky to do effectively.

Oh, and depending on how much you trust the compiler you're outputting
to, you can use the Reg2Mem pass for PHI elimination; it's really
simplistic relative to a real PHI elimination pass, but it might not
matter since you're not actually outputting machine code.

-Eli

Hi,

I like Eli approach here. Phases like SimplifyCFG and various loop transformations are just to useful to cleanup code and generate much high quality output. If we look at the passes, I hope we might be able to quantify what changes they make. My hope is that since the incoming graph is reducible that it doesn't cost that much after running these phases to make them reducible again. Eli's approach reminds me of some loop transformations techniques where one compute a matrix of legal transformations, try them, and then potentially undo them if they are not profitable. In this case, I don't think we want to go this far but it is a nice model to use as it allows us to leverage as much as possible of LLVM phases to cleanup the code.

   -- Mon Ping

Hi Mon Ping,

Yes I like Eli's approach also.

I was thinking that maybe the approach is to implement the pattern matching
algorithm and in the case when not all gotos have been eliminated, in
particular when a sub-component of the graph is irreducible, apply the more
general transformation to the output. This has the benefit of avoiding
applications of Erosa's (more costly) algorithm to the whole graph but
allows its use when absolutely necessary. Of course, it does mean having to
implement both algorithms but in practice neither are that complicated and
so (hopefully) should not take to much effort.

Ben