Handling far branches with fixups or ELF relocs

Hello,

I'm working on an LLVM backend for an experimental microprocessor. Work is going on nicely, and I've until now found the answer to all my questions directly in the LLVM source code, or in the documentation.

However, I'm having problems with the AsmBackend class and the handling of fixups.

The processor I'm working with has a single conditional branch instruction, JCC, that takes an IP-relative 9-bit immediate offset as operand. A second version of the instruction takes a register as operand and can therefore jump to any 32-bit address.

In AsmBackend, there are methods for relaxing instructions, that I wanted to use to replace "JCC imm9" instructions with a sequence of instructions that jumps further. However, I have two questions:

- relaxInstruction does not seem to be able to replace one instruction with a sequence of instructions
- I've looked at many other LLVM backends (AVR, ARC, ARM, MIPS and RISC-V), and none of them really seem to do interesting things in relaxInstruction. ARM for instance relaxes Thumb instructions to normal instructions, and RISC-V relaxes compressed instructions to normal ones too. But in both cases, even the normal instructions have limited range, for which there is an assertion, but no "solution" for when a branch exceeds the range of the instruction.

It therefore seems that the problem of "conditional branches that jump too far" is solved elsewhere, but I could not find that location. I looked at LLD, and I've seen that RISC-V has some code there (related to the PLT) that produces sequences of instructions, but not the other targets.

So, what would be the best way to change "JCC imm9" instructions to something else when a branch has to jump further than 256 instructions before or after the current one?

Best regards,
Denis Steckelmacher

Hello Denis,

In Arm and AArch64 the limited range of immediates for branch instructions is addressed in two parts. The first is when there is a branch that is guaranteed to have a destination within the same section, in that case a 16-bit instruction is relaxed to a 32-bit one with a larger immediate. In Arm the range of the 32-bit instruction is considered large enough within a single section (-ffunction-sections may be needed for giant source files, not many single functions are larger than 16 megabytes). Branches that go across section boundaries and have relocations that the ABI says a linker must "veneer" are resolved at link time by redirecting to linker generated code (called many things from stub, thunk, veneer, trampoline ...) that completes the branch. There is support in LLD via Relocations.cpp, Thunks.cpp etc.

Peter

Hello,

Thank you for the information. I will look at the LLD source files you mention. Thank you also for the “veneer” name. I was aware of stubs, thunks and trampolines, not veneer. I will grep it on the LLVM code-base and see what I find.

In the processor I’m working with, conditional jumps have a range of 1KB forward or backward. The range is problematic, as I’m currently unable to compile CoreMark for that processor (to be more precise, assembly generation works, but ELF object generation fails due to out-of-range conditional jumps). I’m afraid that many conditional jumps will be limited by their range. I therefore have to find a solution that limits the number of additional instructions, or jumps to linker-generated code, as much as possible.

Thank you again. I now have information from which I hope to find a solution to my problem.

Best regards,
Denis

Hi, Denis,

You might also take a look at llvm/lib/Target/PowerPC/PPCBranchSelector.cpp

It does things like change:

// short branch:
// bCC MBB

into:

// long branch:
// b!CC $PC+8
// b MBB

-Hal

Hello,

Thanks, this is very interesting! I’ve also found some comparable code in the ARC backend (ARCBranchFinalize.cpp). To summarize for other people who may face the same problem in the future:

  • On X86, there is an imm32 version of all the jump instructions. It is assumed to be okay, as a function will never be larger than 4 GB.

  • On ARM, it seems that the branches are “large enough”, as pointed out by Peter. Jumps to symbols more than 16 MB away from the current IP are considered rare enough to be okay to be solved with “trampolines” inserted by the linker (I’ve seen code in LLD doing that).

  • ARC and MIPS have much more problematic jump instructions, as the farthest they can jump is only a few kilobytes away at most. Jumps that are too far for these instructions are therefore common, and these backends both implement a MachineFunctionPass that solves the problem using as few instructions as possible.

The solutions used by ARC and MIPS look quite comparable:

  • ARC: Every function is “measured”. If the size of a function is larger than the jump range, then all the jumps in the function that have a limited range are replaced with instructions that can jump farther (in the code, BRcc instructions are replaced with CMP + Bcc)
  • MIPS: A bit smarter, but also much more complex: the distance of each jump is quite precisely measured, taking into account things like basic block alignment. Instructions that are found to jump too far are replaced, on a per-instruction basis, with instructions that jump farther.

In my case, I’m temporarily solving my problem using a solution that is both super short (in code, about 20 lines in total) but also extremely inelegant:

  • I kept all my small JCC imm9 instructions until AsmBackend sees them
  • I added a new instruction to TargetInstrInfo.td : IPREL+JCC. This single instruction, producing a single MCInst, is encoded using twice the number of bits as a normal instruction. It basically encodes a pair of instructions, one that adds an imm18 constant to IP, and one that jumps to an address put in a register (how I solve register allocation in this case is outside the scope of this email).
  • My AsmBackend simply relaxes JCC instructions that jump our of range with IPREL+JCC.

Looking at ARC, MIPS, and now my backend, I’m starting to wonder if an configurable “small jumps to larger jumps” MachineFunctionPass should not be added in the target-independent part of LLVM. ARC, for instance, would greatly benefit from having a version of the MIPS code implemented for it, as would my backend. I’ll look into making that pass myself, but it seems quite complicated and I will probably be able to do it only in a few months. This project is therefore still open for someone else to do.

Best regards, and thanks for all the suggestions!
Denis

Hi, Denis,

You might also take a look at llvm/lib/Target/PowerPC/PPCBranchSelector.cpp

It does things like change:

// short branch:
// bCC MBB

into:

// long branch:
// b!CC $PC+8
// b MBB

-Hal