A small pass to constant fold branch conditions in destination blocks

Hi all, I wrote a little pass (attached) which does the following: if it sees a
conditional branch instruction then it replaces all occurrences of the condition
in the true block with "true" and in the false block with "false". Well, OK, it
is a bit more sophisticated (and a bit more careful!) than that but you get the
idea. It will turn this
   define i1 @t1(i1 %c) {
     br i1 %c, label %t, label %f
   t:
     ret i1 %c
   f:
     ret i1 %c
   }
into this
   define i1 @t1(i1 %c) {
     br i1 %c, label %t, label %f
   t:
     ret i1 true
   f:
     ret i1 false
   }
for example. Curiously enough LLVM doesn't seem to have a pass that does this.
I took a look at the effect on the testsuite by scheduling a run of this pass
just after each run of -correlated-propagation. In spite of being so simple
(not to say simplistic) it has an enormous positive impact on Ada code and a
substantial positive impact throughout the LLVM test-suite (I didn't check that
programs still work after running the pass, so it could be that it has such a
big effect because it is wrong!).

So... should this kind of logic be incorporated into LLVM? Perhaps as part of
an existing pass like -correlated-propagation?

It would be easy to make the pass a bit more powerful. For example if the
condition was "X == 0" then it could also replace X with 0 everywhere in the
true block.

Ciao, Duncan.

PS: This was inspired by PR9004.

Prop.cpp (3.56 KB)

Are you sure this is really advantageous? ‘%c’ is only one variable, but when you add the constant propagation, ‘%c’ and false/true are two different variables. Thus

define i1 @t1(i1 %c) {
br i1 %c, label %t, label %f
t:
ret i1 %c
f:
ret i1 %c
}
should be
br i1 R0, label %t, label %f
t:
ret R0
f:
ret R0

However, with your pass
define i1 @t1(i1 %c) {
br i1 %c, label %t, label %f
t:
ret i1 true
f:
ret i1 false
}
will be
define i1 @t1(i1 %c) {
br i1 R0, label %t, label %f
t:
R1 = true
ret i1 R1
f:
R1 = false
ret i1 R1
}

I am thinking X86 where ‘%c’ would be allocated a register and the false/true statement would be allocated a different register which would be EAX/AX on the x86 machine.

Honestly, I believe this pattern could be conditional constant propagation / conditional re-materialization in the spiller. LLVM uses the spiller to propagate constants. This pass would be useful to identify some conditional re-materializations. You should look into hacking the spiller and see if this can be added to it.

  • My 2 cents,
    Jeff Kunkel

Hi Jeff,

Are you sure this is really advantageous? '%c' is only one variable, but when
you add the constant propagation, '%c' and false/true are two different
variables. Thus

the example was explanatory, not typical. In fact I didn't ever see returns
being split like this in practice. What I do see typically is branches
being eliminated. For example, consider the effect on bzip2: 36 branches are
completely removed, 1 is changed from conditional to unconditional, various
bits of dead code are eliminated (not a lot, 4 stores and a few computations).
I chose this example randomly, but it's typical of what I see elsewhere.

Ciao, Duncan.

Then I misunderstood it’s purpose. I see now that constant propagation could remove branches because you know a value is true. I was looking at the problem through my ‘register allocator’ lens. Here is a more expressive example of what you are doing.

define i1 @t1(i1 %c) {
br i1 %c, label %t, label %f
t:
br i1 %c, label %t2, label %f2
t2:
code…
ret something
f2:
code…
ret something

f:
br i1 %c, label %t3, label %f3
t3:
code…
ret something
f3:
code…
ret something
}

Would be changed into:

define i1 @t1(i1 %c) {
br i1 %c, label %t2, label %f3
t2:
code…
ret something

f3:
code…
ret something
}

Jeff Kunkel

Hi Jeff, sorry my example was misleading.

Then I misunderstood it's purpose. I see now that constant propagation could
remove branches because you know a value is true. I was looking at the problem
through my 'register allocator' lens. Here is a more expressive example of what
you are doing.

define i1 @t1(i1 %c) {
    br i1 %c, label %t, label %f
t:
    br i1 %c, label %t2, label %f2
t2:
   code...
   ret something
f2:
   code...
   ret something

f:
    br i1 %c, label %t3, label %f3
t3:
   code...
   ret something
f3:
   code...
   ret something
  }

Would be changed into:
define i1 @t1(i1 %c) {
    br i1 %c, label %t2, label %f3
t2:
   code...
   ret something
f3:
   code...
   ret something
}

Yes that's exactly what it would do, and this happens a lot in practice. The
reason that it fires a lot in Ada code for example if that the code is full of
compiler generated checks (eg: every array access is checked) and the checks
often end up being repeated (eg: because you access the same array element
twice). Now all the later checks are eliminated if they are implied by the
earlier checks.

Ciao, Duncan.

Yes that's exactly what it would do, and this happens a lot in practice. The
reason that it fires a lot in Ada code for example if that the code is full of
compiler generated checks (eg: every array access is checked) and the checks
often end up being repeated (eg: because you access the same array element
twice). Now all the later checks are eliminated if they are implied by the
earlier checks.

Of course, that only happens if the compares have already been CSE'd.
And it wouldn't handle e.g. 'arr.length > 10' followed by 'arr.length

5'...

This really needs to be implemented in a more generic pass to handle
cases like that (PR 9004 mentions LVI).

Maybe correlated value propagation should also look at all
non-constant operands of all instructions to see if LVI knows they can
be replaced by constants?
In fact it looks like this + SimplifyInstruction() afterwards could
obsolete the special handling for select, load and store while making
the pass more powerful.

Here is a new and improved version that also does the following: if the
condition for a conditional branch has the form "A && B" then A, B and the
condition are all replaced with "true" in the true destination (similarly
for || conditions in the false destination). Also, if the condition has
the form "X == Y" then X is replaced by Y everywhere in the true destination
(likewise if it is "X != Y" then X is replaced by Y everywhere in the false
destination). Completely untested for correctness!

Scheduling this pass after -correlated-propagation results for example in
5759 lines of bitcode being removed from 403.gcc.

Ciao, Duncan.

Prop.cpp (5.82 KB)

Here is a new and improved version that also does the following: if the
condition for a conditional branch has the form "A && B" then A, B and the
condition are all replaced with "true" in the true destination (similarly
for || conditions in the false destination). Also, if the condition has
the form "X == Y" then X is replaced by Y everywhere in the true destination
(likewise if it is "X != Y" then X is replaced by Y everywhere in the false
destination). Completely untested for correctness!

I also had a try at this, integrating it with the existing correlated
value propagation pass. See
<http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110207/116167.html>.

I haven't tried if it provides all of the extra things this latest
version does though. (It probably doesn't handle the 'X == Y' case
where neither is a constant)

My version has some tests and passes 'make check-all'.

Scheduling this pass after -correlated-propagation results for example in
5759 lines of bitcode being removed from 403.gcc.

I didn't gather any statistics on it though.

Hi Frits,

Here is a new and improved version that also does the following: if the
condition for a conditional branch has the form "A&& B" then A, B and the
condition are all replaced with "true" in the true destination (similarly
for || conditions in the false destination). Also, if the condition has
the form "X == Y" then X is replaced by Y everywhere in the true destination
(likewise if it is "X != Y" then X is replaced by Y everywhere in the false
destination). Completely untested for correctness!

I also had a try at this, integrating it with the existing correlated
value propagation pass. See
<http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110207/116167.html>.

I will reply properly tomorrow, but for now let me just remark that querying LVI
for everything is probably very slow. In fact I'm pretty sure that the reason
it only calculates things lazily is that calculating for everything would take
too much compile time. And doubtless this is the reason why the correlated
value propagation pass only analyses a very limited set of instructions.

The big advantage of my technique is that it is fast. The disadvantage is that
it is rather limited in what it can do. Maybe some kind of combination would be
good: add the logic from my pass to correlated value propagation to quickly
forward propagate equality information from branch conditions, and reserve LVI
for analysing a small amount of important instructions.

Ciao, Duncan.

Duncan,

GVN already does this. See lines 1669-1689.

--Owen

Duncan Sands wrote:

Here is a new and improved version that also does the following: if the
condition for a conditional branch has the form "A && B" then A, B and the
condition are all replaced with "true" in the true destination (similarly
for || conditions in the false destination). Also, if the condition has
the form "X == Y" then X is replaced by Y everywhere in the true
destination
(likewise if it is "X != Y" then X is replaced by Y everywhere in the false
destination). Completely untested for correctness!

Not to discourage you, but you're reinventing predsimplify.

What PS did was find branches (or switches) where the target block was uniquely dominated by a single case, then assign %cond = true/false (or %switchval = const) and then walk up propagating that condition upwards (ie., if %cond = and i1 %a, %b then %a and %b are true, and if %a = icmp eq i32 %a, i32 0 then PS would immediately find all uses of %a dominated by that branch and replace them with 0 on the spot). After walking up, you would walk down again with what you learned; for example you may have done %A = add %X, %Y way early on and know in this lower branch that %Y is 0, now %A = %X. PS had a full comparison lattice (not-equals, signed-lessthan-unsigned-greaterthan) and also grew constant range analysis (x < y implies x != -1 and y != 0).

While I know PS removed tons of code from the .ll I was never really able to detect a runtime performance improvement; my guess is that while I was killing off lots of dead code it didn't impact the performance of the live code much or at all.

Nick

Nick Lewycky wrote:

Duncan Sands wrote:

Here is a new and improved version that also does the following: if the
condition for a conditional branch has the form "A && B" then A, B and the
condition are all replaced with "true" in the true destination (similarly
for || conditions in the false destination). Also, if the condition has
the form "X == Y" then X is replaced by Y everywhere in the true
destination
(likewise if it is "X != Y" then X is replaced by Y everywhere in the false
destination). Completely untested for correctness!

Not to discourage you, but you're reinventing predsimplify.

What PS did was find branches (or switches) where the target block was uniquely dominated by a single case, then assign %cond = true/false (or %switchval = const) and then walk up propagating that condition upwards (ie., if %cond = and i1 %a, %b then %a and %b are true, and if %a = icmp eq i32 %a, i32 0 then PS would immediately find all uses of %a dominated by that branch and replace them with 0 on the spot). After walking up, you would walk down again with what you learned; for example you may have done %A = add %X, %Y way early on and know in this lower branch that %Y is 0, now %A = %X. PS had a full comparison lattice (not-equals, signed-lessthan-unsigned-greaterthan) and also grew constant range analysis (x < y implies x != -1 and y != 0).

While I know PS removed tons of code from the .ll I was never really able to detect a runtime performance improvement; my guess is that while I was killing off lots of dead code it didn't impact the performance of the live code much or at all.

Killing off lots of dead code is going to improve compile time,
since code-gen time tends to dominate optimise time.

Improving compile time is important for those of us using the JIT compiler.

Mark.

Hi Owen,

GVN already does this. See lines 1669-1689.

you are mistaken. If you try running GVN on the example you will see that it
doesn't do anything to it. The problem is that GVN only applies its logic when
it simplifies an expression due to value numbering. If value numbering gives
nothing (like for the returns in the example) then it doesn't do anything. I
first noticed this because instsimplify improvements mean that GVN's value
numbering catches less stuff (because it was caught earlier) and thus its
correlated expression logic doesn't kick in resulting in less simplifications,
i.e. improving instsimplify can cause regressions due to this GVN flaw. I
opened PR9004 for this.

Ciao, Duncan.

Except PredSimplify was also a big compile time sink, and didn't make up for any benefits it gave on that front.

--Owen

While I know PS removed tons of code from the .ll I was never really
able to detect a runtime performance improvement; my guess is that while
I was killing off lots of dead code it didn't impact the performance of
the live code much or at all.

Killing off lots of dead code is going to improve compile time,
since code-gen time tends to dominate optimise time.

Improving compile time is important for those of us using the JIT compiler.

Except PredSimplify was also a big compile time sink, and didn't make up for any benefits it gave on that front.

A big advantage of my pass is that it seems to be pretty quick. That's because
it doesn't try to go beyond simple propagation of equalities.

Ciao, Duncan.