Hi,

Is there a pass in LLVM that can optimize:

if (x)

a[i] = y0;

else

a[i] = y1;

to

a[i] = x ? y0 : y1?

I tried opt (3.9) with -O3 but looks like such an optimization do not happened.

Thanks

Hongbin

Hi,

Is there a pass in LLVM that can optimize:

if (x)

a[i] = y0;

else

a[i] = y1;

to

a[i] = x ? y0 : y1?

I tried opt (3.9) with -O3 but looks like such an optimization do not happened.

Thanks

Hongbin

The same IR at -O3 for both cases on this example.

Thanks,

Looks like inst combine do the job

Looks like this do not work:

// Type your code here, or load an example.

int a[10];

void unswitch2(int i, int x, int y0, int y1) {

if (x) {

a[i] = y0;

a[i + 1] = y1;

} else {

a[i + 1] = y0;

a[i] = y1;

}

}

Hi,

Yes, I can see why that would not work.

The sinking algorithm in SimplifyCFG isn’t particularly clever. In particular it can’t reason about memory ordering and aliasing. In unswitch1(), it can identify that the stores correlate because the correlating stores appear in the same relative source order. In unswitch2() they have been permuted, and the algorithm cannot deal with this. This requires some kind of value numbering to do efficiently.

The GVNSink pass that I really should get around to updating should solve this, eventually!

James

Interestingly, only `icc` produces the same code for both forms in this case.

I'm not entirely convinced this matters, but if you care, consider

opening a bug on llvm.org/bugs and/or submitting a patch.

Hi James,

I have an ad-hoc solution in mind to solve this problem.

But if it can be solved under the framework of GVN"Sink", it is even better.

any plan on your https://reviews.llvm.org/D24805?

Thanks

Hongbin

It’s basically ready to commit; the reviewers were fairly happy with it. It needs rebasing on top of NewGVN and any bugs that shakes out fixed, but that’s about it.

I want to get around to it soon-ish, but I’ve wanted that for a while!

I have a patch i’m working on to split newgvn into a proper analysis.

It works (and happy to share it), it’s just the interface i’m using is a little ugly (haven’t decided what the right API is), but let me know what GVNSink needs from life and i’ll make sure it’s there

Hi Danny,

Thanks for that However I’ve just updated the prototype patch to NewGVN and it didn’t need any API changes - all I rely on is GVNExpression.

Hongbin,

I wanted to explain a little about what GVNSink can currently do, what it was designed for and hopefully how to make it handle your testcase.

**Background**

Common code sinking is more difficult to efficently do than one might expect. Prior to October 2016, LLVM was able to handle simple cases of this within SimplifyCFG, for example:

if (a) {

b += 42;

} else {

b += 42;

}

// → b += 42

It could even sink operations where a phi was required:

if (a) {

b += 42;

} else {

b += 64;

}

// → b += phi(42, 64)

However it had a simplistic execution model and so:

- Restricted itself to blocks with only TWO incoming arcs
- Would only ever produce one PHI and then bail out

Generally, the factors that make the search space for a good generic sinking optimization large are:

- The number of incoming edges to a block.
- Non-monotonicity of heuristic - for example it might not be worthwhile sinking 2 instructions but may be worthwhile sinking 3 instructions.
- Permutations of otherwise similar instructions in incoming blocks.

I’ll describe them all in a bit more detail:

**1) The number of incoming edges to a block**

There are many constructs that can create a block with more than two input edges. An obvious example is a switch:

switch (a) {

case 1: b += 32; break;

case 2: b += 43; break;

default: break;

}

Another is a chain of if-elses:

if (a) {

b += 32;

} else if (c) {

b += 43;

} else {

b += 65;

}

Note that those two examples are subtly different. In the latter example, every branch performs an addition to ‘b’. We can therefore sink all the adds to the end and perform only one add:

b += phi(32, 43, 65);

However in the former case things aren’t so simple. The default case never does any add, so we have to speculate the add instruction in that case. For adds, this is simple - adds don’t have sideeffects and we can speculate by adding with an identity (0):

b += phi(32, 43, 0);

However what if ‘b’ was a global variable? In this case, each case of the switch will have a load/add/store sequence:

switch(a) ---------------

/ \ \

[t1 = load @b] [t2 = load @b] |

[t1 = add t1, 32] [t2 = add t2, 43] |

[store @b, t1] [store @b, t2] |

\ / /

v v v-------------

In most cases, we cannot be sure that in the default case @b is allowed to be written to. Speculating a store in this case could be invalid. Therefore, if we want to sink the store (and thus the add and load) we need to invent a new block to sink them to.

switch(a) ---------------

/ \ \

[t1 = load @b] [t2 = load @b] |

[t1 = add t1, 32] [t2 = add t2, 43] |

\ / |

[ p = phi(t1, t2) ] /

[ store @b, p ] /

±-----------

v v

Creating this new block allows the switch to be turned into a trivial one that can be lowered to a bit test or jump table.

There is an even more complex case that appears quite a bit:

if (a) {

switch (b) {

case 1: c += 42; break;

case 2: c += 2; break;

default: c += 3; break;

}

} else {

switch (d) {

case 1: e -= 2; break;

case 2: e -= 6; break;

default: e -= 32; break;

}

}

We canonicalize this CFG to one cleanup block with six incoming arcs. In this case, it takes some nontrivial analysis to determine what to do, as it requires partitioning the incoming arcs into sets, performing different sinking operations on each (in this case two disjoint sets, but there are cases where the sets are not disjoint and some cost heuristic needs to be used).

**2) Non-monotonicity of heuristic**

What I’m trying to say here is the greedy algorithms work best when the heuristic they are computing is monotonic. Consider this:

if (a) {

tmp1 = *b;

tmp2 = *c;

tmp3 = tmp1 + tmp2;

*d = tmp3;

} else {

tmp4 = *b;

tmp5 = *c;

tmp6 = tmp4 + tmp5;

*e = tmp6;

}

If we proceed sequentially from the bottom of these blocks, trying to determine if sinking them is worthwhile:

Step 1: [*e = tmp6], [*d = tmp3]

- This is the same operation, a store, but it requires two PHIs - one for (e,d) and one for (tmp6, tmp3).
- This is NOT worthwhile. We are constraining register allocation for only one instruction commonned.

Step 2: [tmp6 = tmp4 + tmp5], [tmp3 = tmp1 + tmp2]

- Again, the operation is the same, but now we need another two PHIs, plus one PHI from Step 1 (the phi required for (tmp6, tmp3) is now no longer required).
- Three PHIs required for two instructions sunk. NOT worth it.

Step 3: [tmp5 = *c], [tmp2 = *c]

- These can be sunk, and have the same arguments so no PHIs required. We still need two of the PHIs from step 2 (the (tmp5, tmp2) phi now is no longer required)
- At this point we’ve sunk three instructions for two PHIs. Maybe it’s worth it?

Step 4: [tmp4 = *b], [tmp1 = *b]

- Same as step 3, but now the (tmp4, tmp1) PHI from step 2 goes away too
- So now we’re sinking 4 instructions for one PHI.
- Now it’s definitely worth it!

Whatever heuristic one chooses, in a case like this it will get worse, then maybe get better again. So we can’t simply bail out when the heuristic reaches some threshold, because maybe it will improve as we analyze more instructions!

**3) Permutations of otherwise similar instructions**

This is the case you describe. Stores (for example) that look similar but have been permuted:

if (a) {

*b = 52;

*c = 43;

} else {

*c = 87;

*b = 54;

}

If we approach this bottom-up, we will compare [*b = 54], [*c = 43], then [*c = 87], [*b = 52] and conclude that two instructions can be sunk, but 4 PHIs are required (because the target of the store and value are different in each case). If we looked at [*b = 54], [*b = 52], perhaps we would have seen that we can sink this with only *two* PHIs required and that might change our decision.

This is generally quite difficult and especially shows up with memory instructions like this (if c and b alias, then permuting them when sinking would be illegal. So we need alias analysis here).

**What GVNSink attempts to solve**

The current implementation of GVNSink was meant to solve 1) and 2), but NOT 3). 3) wasn’t within scope when I was writing GVNSink, but I think 3) is doable for some cases within the framework of GVNSink.

**How GVNSink works**

GVNSink works by computing a value number for each IR instruction, similar to GVN. However, GVNSink’s value numbering works slightly from GVN’s.

To recap, the value number (VN) of an instruction is a function of the instruction. If VN[I] == VN[J], I and J can be considered equivalent, for some definition of equivalence.

In GVN, VN[I] is based upon I and all of I’s operands, transitively. Therefore if VN[I] == VN[J], I and J were computed in the same way and J can be used instead of I.

In GVNSink, we flip this on its head. VN[I] is based upon I’s *uses*, transitively. Therefore if VN[I] == VN[J], all of the uses of I are the same as all of the uses of J. Therefore I and J can be merged. So this:

[ %a = add %I, 1 ] [ %c = add %J, 1 ]

[ %b = sub %a, 5 ] [ %d = sub %c, 5 ]

[ ret %b ] [ ret %d ]

Can be transformed into this:

\ /

[ %IJ = phi %I, %J ]

[ %a = add %IJ, 1 ]

[ %b = sub %a, 5 ]

[ ret %b ]

In GVNSink, we step backwards across all instructions in all incoming basic blocks (see LockstepReverseIterator). This produces, conceptually, a sequence of sets of instructions.

Each of these sets is analyzed in turn (see analyzeInstructionForSinking) - the value number of each contained instruction is calculated and used to determine if the entire set is mergeable or not (all values have the same VN).

If not all instructions in the set have the same VN, the set is pruned to just those with the same VN (or we stop if all instructions had different VNs). If this set passes extra legality checks, a SinkingInstructionCandidate is created. This contains pertinent data for the cost heuristic:

- Number of instructions from each incoming block being sunk
- Number of blocks to pick instructions from (this can be a subset of all the incoming blocks!)
- How many PHIs would be generated in total

After all candidates have been examined, they are sorted based on a heuristic and the best candidate picked and applied.

The use of value numbering in this algorithm is what allows us to do this in somewhat-linear time with respect to the number of instructions inspected.

Now, to solve 3), I think the component that needs swapping out is LockstepReverseIterator. This currently iterates backwards in lockstep across a set of blocks. To handle permutations, I think we could use a priority queue that feeds the main algorithm instructions from the input blocks in a slightly sorted order.

For example, perhaps it might order sequences of stores by their target operand (assuming alias analysis says it can), ensuring sequences of stores in different blocks appear in the same order?

Anyway, I hope this has helped understand a bit more about what GVNSink does, can do and how to modify it if you want to solve your problem.

Cheers,

James

Hi Danny,

Thanks for that However I've just updated the prototype patch to NewGVN

and it didn't need any API changes - all I rely on is GVNExpression.

Oh, nice.

Note, Bryant has been working on PDSE, which, combined with your code here,

would probably give us value based partial dead store elimination

(since the value numbering issues are related)

Hongbin,

Hi James,

Thanks a lot for the explanations, they make sense.

This is normally known as global code motion, and there are numerous algorithms that do it

This is normally known as global code motion, and there are numerous

algorithms that do it

Yes.

Thanks

Hongbin