Loop branching inefficiencies in Backend output

Hi,

I am working on a custom backend, and I am trying to figure out how to deal with some branching inefficiencies in my output code, and the best way to fix it.

So, let’s say I am compiling a small function that takes the sum of an array.
int loop(int* array, int n) {
int ret = 0;
for (int i = 0; i < n; i++) {
ret += array[i];
}
return ret;
}

The problem I am having is that in the generated code, we got something along these lines.

loop:

branch cond exit-block
branch loop
exit-block:

Now, due to the basic block placement, that could be simplified to the following, removing an extra branch instruction.

loop:

branch !cond loop
exit-block:

So, I decided to investigate to see if this problem occurs in other backends (basically to see if I am missing some implementation in my backend). I found that you can see this issue with the same code for the NVPTX backend, but not for the x86 or ARM backends.

When I looked at the debug output for the x86 backend though, I can’t figure out how they realize to get rid of that branch. Up to the last point that MachineInstrs are printed (the rewrite virtual registers pass), the MachineInstrs still show the two jump paradigm. But in the final output, it somehow becomes one branch case.

I know one solution is for me to add a pass after basic block placement to change these instructions, but I am wondering if there is a recommended way to do this. Especially as this seems like a common problem, where an existing generic pass would have some interface a backend should support to allow branch re-writes to create the single branch case.

Thanks,
-Dilan

When I looked at the debug output for the x86 backend though, I can't figure out how they realize to get rid of that branch. Up to the last point that MachineInstrs are printed (the rewrite virtual registers pass), the MachineInstrs still show the two jump paradigm. But in the final output, it somehow becomes one branch case.

The "-print-after-all" flag is often useful to figure out what various passes do.

I know one solution is for me to add a pass after basic block placement to change these instructions, but I am wondering if there is a recommended way to do this. Especially as this seems like a common problem, where an existing generic pass would have some interface a backend should support to allow branch re-writes to create the single branch case.

This transform already exists as target-independent code in BranchFolding.cpp. Is TargetInstrInfo::reverseBranchCondition implemented for your target?

-Eli

Thanks! That is exactly what I was looking for. I was basing most of my stuff on NVPTX and apparently they don’t bother implementing the reverseBranchCondition.

Also, thanks for print-after-all flag. Should be useful for me.

Thanks again for all the help.
-Dilan