SelectionDAGBuilder doing bad things on certain architectures

The selection dag builder has an ‘optimization’ added into the visitBr function which makes assumptions that are not valid on all architectures. The problem is this.

The following function

kernel void cf_test(global int* a, int b, int c, int e)

{

int d = 0;

if (!b && c < e) {

d = a + b;

}

*a = d;

}

Is transformed into something equivalent to this:

Kernel void cf_test(global int* a, int b, int c, int e)

{

Int d;

If (b) {

d = 0;

} else {

if (c < e) {

d = a + b;

} else {

d = 0;

}

}

*a = d;

}

by the visitBr code found in SelectionDAGBuilder::visitBr():1188.

However, if jumps are expensive or jumps are not supported and high level flow control needs to be reconstructed. This is extremely inefficient. For example on AMD GPU’s, a single flow control instruction can take 40 cycles to execute, but an bit instruction, can be executed every cycle. So obviously the assumptions made by this block of code are inefficient on AMD hardware. Increasing control flow has a direct impact on performance and removing the extra ‘and’ or ‘or’ in order to short circuit the conditional evaluation does not work for our target.

So in order to make this type of instruction rely more on target specific information. I’ve added a new Boolean to the TargetLoweringInfo class called JumpIsExpensive along with accessor functions.

Please review the patch and apply if acceptable.

Thanks,

Micah

jump_boolean.patch (3.25 KB)