{ARM} IfConversion does not detect BX instruction as a branch

Hi all,

I got a silly bug when compiling our project with the latest Clang. Here’s the outputted assembly:

tst r3, #255
strbeq r6, [r7]
ldreq r6, [r4, r6, lsl #2]
strne r6, [r7, #4]
ldr r6, [r4, r6, lsl #2]
bx r6

For the code to execute correctly, either the ldr should be a ldrne instruction or the ldreq instruction should be removed. The error seems to come from the IfConvertion MachinePass. Here’s is what it looks like before and after.

#BEFORE IfConversion MachinePass

BB#7:
Live Ins: %LR %R0 %R1 %R2 %R4 %R5 %R6 %R7 %R8 %R9 %R10 %R12
Predecessors according to CFG: BB#5 BB#6
STRBi12 %R5, %R6, 0, pred:14, pred:%noreg; mem:ST1[%cond.i23.i.i.i]
%R6 = LDRBi12 %R7, 0, pred:14, pred:%noreg; mem:LD1%15
%R3 = EORri %R6, 254, pred:14, pred:%noreg, opt:%noreg
%R3 = ANDrr %R3, %R6, pred:14, pred:%noreg, opt:%noreg
%R6 = MOVi 0, pred:14, pred:%noreg, opt:%noreg
TSTri %R3, 255, pred:14, pred:%noreg, %CPSR;
Bcc <BB#9>, pred:0, pred:%CPSR;

BB#8:
Live Ins: %LR %R0 %R1 %R2 %R4 %R5 %R6 %R7 %R8 %R9 %R10 %R12
Predecessors according to CFG: BB#7
STRi12 %R6, %R7, 4, pred:14, pred:%noreg; mem:ST4[%_size.i3.i.i.i.i]
%R6 = LDRrs %R4, %R6, 16386, pred:14, pred:%noreg; mem:LD4[%0]
BX %R6

BB#9:
Live Ins: %LR %R0 %R1 %R2 %R4 %R5 %R6 %R7 %R8 %R9 %R10 %R12
Predecessors according to CFG: BB#7
STRBi12 %R6, %R7, 0, pred:14, pred:%noreg; mem:ST1%21
%R6 = LDRrs %R4, %R6, 16386, pred:14, pred:%noreg; mem:LD4[%0]
BX %R6

#AFTER IfConversion MachinePass

BB#7:

TSTri %R3, 255, pred:14, pred:%noreg, %CPSR;
STRBi12 %R6, %R7, 0, pred:0, pred:%CPSR; mem:ST1%21
%R6 = LDRrs %R4, %R6, 16386, pred:0, pred:%CPSR, %R6<imp-use,kill>; mem:LD4[%0]
STRi12 %R6, %R7, 4, pred:1, pred:%CPSR; mem:ST4[%_size.i3.i.i.i.i]
%R6 = LDRrs %R4, %R6, 16386, pred:14, pred:%noreg; mem:LD4[%0]
BX %R6

Inside the IfConvertDiamondCommon(…) function of IfConversion.cpp, the function is called with NumDups2=1, which makes sense because BB#8 and BB#9 share the same LDRrs instruction with the same operands. The problem is the call to TTI->removeBranch(…) function that does not remove the BX instruction. Thus, when removing the common instructions, the BX is removed instead of the LDRs instruction.

Before removeBranch call on MBB1

BB#9: derived from LLVM BB %if.else.i.i.i.i
Live Ins: %LR %R0 %R1 %R2 %R4 %R5 %R6 %R7 %R8 %R9 %R10 %R12
Predecessors according to CFG: BB#7
STRBi12 %R6, %R7, 0, pred:14, pred:%noreg; mem:ST1%21
%R6 = LDRrs %R4, %R6, 16386, pred:14, pred:%noreg; mem:LD4[%0]
BX %R6

After removeBranch call on MBB1, returned value:0

BB#9: derived from LLVM BB %if.else.i.i.i.i
Live Ins: %LR %R0 %R1 %R2 %R4 %R5 %R6 %R7 %R8 %R9 %R10 %R12
Predecessors according to CFG: BB#7
STRBi12 %R6, %R7, 0, pred:14, pred:%noreg; mem:ST1%21
%R6 = LDRrs %R4, %R6, 16386, pred:14, pred:%noreg; mem:LD4[%0]
BX %R6

#After removing common instructions (NumDups2=1) on MBB1
BB#9: derived from LLVM BB %if.else.i.i.i.i
Live Ins: %LR %R0 %R1 %R2 %R4 %R5 %R6 %R7 %R8 %R9 %R10 %R12
Predecessors according to CFG: BB#7
STRBi12 %R6, %R7, 0, pred:14, pred:%noreg; mem:ST1%21
%R6 = LDRrs %R4, %R6, 16386, pred:14, pred:%noreg; mem:LD4[%0]

#After predicated on MBB1
BB#9: derived from LLVM BB %if.else.i.i.i.i
Live Ins: %LR %R0 %R1 %R2 %R4 %R5 %R6 %R7 %R8 %R9 %R10 %R12
Predecessors according to CFG: BB#7
STRBi12 %R6, %R7, 0, pred:0, pred:%CPSR; mem:ST1%21
%R6 = LDRrs %R4, %R6, 16386, pred:0, pred:%CPSR; mem:LD4[%0]

We can clearly see that the LDRrs is still there. As a result, after BB#9 and BB#8 have been merged into BB#7, the LDRs instruction is done twice when the “positive” path is taken.

My current fix is the following:

@@ -408,7 +408,8 @@ unsigned ARMBaseInstrInfo::removeBranch(MachineBasicBlock &MBB,
return 0;

if (!isUncondBranchOpcode(I->getOpcode()) &&

  • !isCondBranchOpcode(I->getOpcode()))
  • !isCondBranchOpcode(I->getOpcode()) &&
  • !isIndirectBranchOpcode(I->getOpcode()))
    return 0;

Does that makes sense?

Regards,

Gael

in ARMBaseInstrInfo::removeBranch

It probably has escaped unnoticed because there aren't many code paths that
will try to remove an indirect branch.
I'll let ARM folks chime in, but it looks correct to me.

Kyle.

Thank you for your point of view. Anyway, after many tests, it appears that the changes fixed my issue. I tried to reproduce a minimal piece of C to trigger the bug, without success. In fact, I am still wondering if removeBranch(…) should support removing “jump-table” branches as well (i**sJumpTableBranchOpcode()). I didn’t hit that specific case, but I will keep those emails preciously to remember the issue if something similar appears again.

Regards;

Gael

Target-independent code is only supposed to call removeBranch in cases where analyzeBranch returns false; as far as I can tell, ARMBaseInstrInfo::analyzeBranch will never return false for an indirect branch. So I would guess there’s a bug in IfConversion, rather than the ARM backend. -Eli

Hi all,

I got a silly bug when compiling our project with the latest Clang. Here's
the outputted assembly:

tst r3, #255
strbeq r6, [r7]
ldreq r6, [r4, r6, lsl #2]
strne r6, [r7, #4]
ldr r6, [r4, r6, lsl #2]
bx r6

For the code to execute correctly, either the *ldr* should be a *ldrne*
instruction or the *ldreq* instruction should be removed. The error seems
to come from the IfConvertion MachinePass. Here's is what it looks like
before and after.

#BEFORE IfConversion MachinePass

BB#7:
   Live Ins: %LR %R0 %R1 %R2 %R4 %R5 %R6 %R7 %R8 %R9 %R10 %R12
   Predecessors according to CFG: BB#5 BB#6
   STRBi12 %R5, %R6<kill>, 0, pred:14, pred:%noreg;
mem:ST1[%cond.i23.i.i.i]
   %R6<def> = LDRBi12 %R7, 0, pred:14, pred:%noreg; mem:LD1[%15](align=4)
   %R3<def> = EORri %R6, 254, pred:14, pred:%noreg, opt:%noreg
   %R3<def> = ANDrr %R3<kill>, %R6<kill>, pred:14, pred:%noreg, opt:%noreg
   %R6<def> = MOVi 0, pred:14, pred:%noreg, opt:%noreg
   TSTri %R3<kill>, 255, pred:14, pred:%noreg, %CPSR<imp-def>;
   Bcc <BB#9>, pred:0, pred:%CPSR<kill>;

BB#8:
   Live Ins: %LR %R0 %R1 %R2 %R4 %R5 %R6 %R7 %R8 %R9 %R10 %R12
   Predecessors according to CFG: BB#7
   STRi12 %R6, %R7<kill>, 4, pred:14, pred:%noreg;
mem:ST4[%__size_.i3.i.i.i.i]
   %R6<def> = LDRrs %R4, %R6<kill>, 16386, pred:14, pred:%noreg;
mem:LD4[%0]
   BX %R6<kill>

BB#9:
   Live Ins: %LR %R0 %R1 %R2 %R4 %R5 %R6 %R7 %R8 %R9 %R10 %R12
   Predecessors according to CFG: BB#7
   STRBi12 %R6, %R7<kill>, 0, pred:14, pred:%noreg; mem:ST1[%21](align=4)
   %R6<def> = LDRrs %R4, %R6<kill>, 16386, pred:14, pred:%noreg;
mem:LD4[%0]
   BX %R6<kill>

#AFTER IfConversion MachinePass

BB#7:
   ...
   TSTri %R3<kill>, 255, pred:14, pred:%noreg, %CPSR<imp-def>;
   STRBi12 %R6, %R7, 0, pred:0, pred:%CPSR; mem:ST1[%21](align=4)
   %R6<def> = LDRrs %R4, %R6<kill>, 16386, pred:0, pred:%CPSR,
%R6<imp-use,kill>; mem:LD4[%0]
   STRi12 %R6, %R7<kill>, 4, pred:1, pred:%CPSR<kill>;
mem:ST4[%__size_.i3.i.i.i.i]
   %R6<def> = LDRrs %R4, %R6<kill>, 16386, pred:14, pred:%noreg;
mem:LD4[%0]
   BX %R6<kill>

Inside the *IfConvertDiamondCommon(...)* function of IfConversion.cpp,
the function is called with *NumDups2=1*, which makes sense because BB#8
and BB#9 share the same *LDRrs* instruction with the same operands. The
problem is the call to *TTI->removeBranch(...)* function that does not
remove the *BX* instruction. Thus, when removing the common instructions,
the *BX* is removed instead of the *LDRs* instruction.

# Before removeBranch call on MBB1
BB#9: derived from LLVM BB %if.else.i.i.i.i
   Live Ins: %LR %R0 %R1 %R2 %R4 %R5 %R6 %R7 %R8 %R9 %R10 %R12
   Predecessors according to CFG: BB#7
   STRBi12 %R6, %R7<kill>, 0, pred:14, pred:%noreg; mem:ST1[%21](align=4)
   %R6<def> = LDRrs %R4, %R6<kill>, 16386, pred:14, pred:%noreg;
mem:LD4[%0]
   BX %R6<kill>

# After removeBranch call on MBB1, returned value:0
BB#9: derived from LLVM BB %if.else.i.i.i.i
   Live Ins: %LR %R0 %R1 %R2 %R4 %R5 %R6 %R7 %R8 %R9 %R10 %R12
   Predecessors according to CFG: BB#7
   STRBi12 %R6, %R7<kill>, 0, pred:14, pred:%noreg; mem:ST1[%21](align=4)
   %R6<def> = LDRrs %R4, %R6<kill>, 16386, pred:14, pred:%noreg;
mem:LD4[%0]
   BX %R6<kill>

#After removing common instructions (NumDups2=1) on MBB1
BB#9: derived from LLVM BB %if.else.i.i.i.i
   Live Ins: %LR %R0 %R1 %R2 %R4 %R5 %R6 %R7 %R8 %R9 %R10 %R12
   Predecessors according to CFG: BB#7
   STRBi12 %R6, %R7<kill>, 0, pred:14, pred:%noreg; mem:ST1[%21](align=4)
  %R6<def> = LDRrs %R4, %R6<kill>, 16386, pred:14, pred:%noreg; mem:LD4[%0]

#After predicated on MBB1
BB#9: derived from LLVM BB %if.else.i.i.i.i
   Live Ins: %LR %R0 %R1 %R2 %R4 %R5 %R6 %R7 %R8 %R9 %R10 %R12
   Predecessors according to CFG: BB#7
   STRBi12 %R6, %R7<kill>, 0, pred:0, pred:%CPSR; mem:ST1[%21](align=4)
  %R6<def> = LDRrs %R4, %R6<kill>, 16386, pred:0, pred:%CPSR; mem:LD4[%0]

We can clearly see that the *LDRrs* is still there. As a result, after
BB#9 and BB#8 have been merged into BB#7, the *LDRs* instruction is done
twice when the "positive" path is taken.

My current fix is the following:

@@ -408,7 +408,8 @@ unsigned ARMBaseInstrInfo::removeBranch(MachineBasicBlock
&MBB,
   return 0;

   if (!isUncondBranchOpcode(I->getOpcode()) &&
- !isCondBranchOpcode(I->getOpcode()))
+ !isCondBranchOpcode(I->getOpcode()) &&
+ !isIndirectBranchOpcode(I->getOpcode()))
      return 0;

Does that makes sense?

Target-independent code is only supposed to call removeBranch in cases
where analyzeBranch returns false; as far as I can tell, ARMBaseInstrInfo::analyzeBranch
will never return false for an indirect branch. So I would guess there's a
bug in IfConversion, rather than the ARM backend.

-Eli

Eli's right.

Gaël, do you think you could patch ifconversion to remove the instructions
rather than relying on removeBranch?

Eli's right.
Gaël, do you think you could patch ifconversion to remove the instructions rather than relying on removeBranch?

In fact, the BB has a IsBrAnalyzable attribute set to true
(=analyzeBranch() returned false) but the TrueBB and FalseBB of the
diamond have both IsBrAnalyzable set to false (=anayzeBranch() returned
true). But it is still detected as a valid diamond and later during the
processing of the diamond, the branches are removed without looking at
IsBrAnalyzable attribute (which does not work because indirect branches
are not supported by removeBranch())...

The thing is that I don't understand IfConversion MachinePass enough to
know what I should modify. Does indirect branches in TrueBB and FalseBB
should be supported as a diamond (a comment says that TailBB can be
empty, does that means it is a "good" diamond shape with indirect
branches? Does a valid diamond must have analyzeBranch() returning
false for TrueBB and FalseBB as well?

From my point of view, when each TrueBB and FalseBB contain an indirect

branch, it cannot be a diamond since they can jump to anywhere. My
conservative approach would be to add a check inside ValidDiamond to
detect this case and return false to avoid the conversion.

Any idea?

I took a look at 3.9 and 4.0 code. Now, here's what I understood:

   * The IfConversion in 3.9 is interesting as it checks
_IsBrAnalyzable()_ for TrueBB and FalseBB before calling
_removeBranch()_.
   * The comments and code of _analyzeBranch()_ and _removeBranch()_ are
clear to me now. My previous fix inside removeBranch() was definitely
not correct.
   * Based on the comments (for example : "a join block may not be
present") and the code, a diamond does not have to be a complete
diamond. The TailBB is not mandatory.

Conclusion: My case should be supported then.

_IfConvertDiamondCommon() _support different scenarios for a given
EntryBB. A quick summary:

   * FalseBB and TrueBB contains both an analyzable branch.
_SkipUnconditionalBranches=true_ meaning the branches are not checked if
they are identical. _ValidDiamond()_ checks that they end up on the same
TailBB. If TailBB is either nullptr or not the same for both TrueBB and
FalseBB, the diamond is not valid. _IfConvertDiamondCommon()_ is called
with removeBranch=true. Branches are successfully removed in both
blocks, thus common instructions are correctly removed based on Dups2
value. At the end, TailBB is merged or a conditional branch is inserted.
IfConversion successful.
   * FalseBB not analyzable and TrueBB analyzable or inversely (inducing
a different branch instruction). _ValidDiamond()_ is going to fail since
FalseBB/TrueBB.TrueBB will be equal to nullptr for the not analyzable
one hence _TT =! FT _will trigger.

  * FalseBB and TrueBB both not analyzable (indirect branch for example
on ARM). So _SkipUnconditionalBranches=false._

   * If the branches are not identical, _CountDuplicatedInstruction_ will
set Dups2 to 0_. _Then_ IfConvertDiamondCommon()_ is called with
removeBranch=false. The _removeBranch()_ on TrueBB/BBI1 will silently
fail. TrueBB will be predicated completely (branch instruction
included). FalseBB on the other hand will be predicated excluding the
branch instruction (due to the loop on DI2 when removeBranch=false).
IfConversion successful.

  * If the branches are identical. _CountDuplicatedInstruction_ will set
Dups2 accordingly.

   * If there's no common instruction, same behaviour as in 3.1
   * If there's some common instructions, this is my case !

In my case: The EntryBB is analyzable but TrueBB/FalseBB are not
(_AnalyzeBranches()_). TrueBB/FalseBB instructions can be predicated
(_ScanInstruction()_. _feasibleDiamond()_). TrueBB/FalseBB cannot
_SkipUnconditionalBranches_ (_ValidDiamond()_).
_CountDuplicatedInstruction()_ compute Dups1=0 as there's no common
instructions found at the beginning and Dups2=1 as it found one common
instruction at the end. During their computation, neither debug
instructions (_skipDebugInstructionsForward()_) nor branch instructions
(_TIB->isBranch()_) are counted. We note that because TrueBB and FalseBB
are not analyzable, _CountDuplicatedInstruction()_ also ensure that the
branch instruction is exactly the same in both blocks. If not, it would
not fail but simply set Dups2 to 0.

Now when the blocks are prepared to be merged in
_IfConvertDiamondCommon()_, _removeBranch()_ fails silently on TrueBB
(BBI1) as indirect branches are not supported. Hence the erasing of the
common instructions based on Dups2=1 value will produce an invalid block
because it will remove the branch instead of the actual common
instruction.

In the patch below, I manually delete the branch if and only if they are
identical in both block TrueBB/FalseBB. I also removed the call to
_verifySameBranchInstructions()_ as the blocks do not have to have
identical branch as explained in point 3.1 above. Works fine