Link time optimization of jump to target instruction

Hello, I’m writing a backend for a custom instruction set. I’m trying to optimize a few things.
We have a 16-bit instruction for jumps (j/jl (jump/jump and link)) with an immediate 11 bits wide, but sometimes the 11 bits are not enough, so, for function calls, we load the immediate into a register and then jump to that register. This is similar to the pseudo-instruction call for riscv (that gets expanded into auipc + jalr). I know that the linker can relax auipc + jalr into a single jal instruction. But I haven’t found how riscv does that, neither in the codebase of lld nor in llvm. Can somebody point me in the right direction (my goal is to use the 16-bit j/jl instruction when possible and use the load immediate + jump register pair when absolutely necessary)? Thanks

Marco

Hi Marco,

What you are looking for is ‘linker relaxation’, and AFAIK this is not implemented upstream for LLD (for RISC-V I believe the GNU linker is more commonly used to do this, at least in the use cases I’ve seen, since there it is implemented).

How this typically works is the linker would identify a relocation where the destination value meets some condition (i.e. your it fits in the 16-bit instruction), and then replace that relocation with a different one, replace the underlying bit pattern and then remove the bytes in the section that are no longer needed, making sure all other references to other relocations and symbols stay in place.

I know there have been a few attempts by various people to implement various parts of relaxation for RISC-V in LLD, you may find some helpful tips in one of the following

1 Like

If you just need a solution at the compilation unit level, then you can look at the BranchRelaxation pass or the SystemZLongBranch pass as a blueprint. The SystemZLongBranch pass changes relative branch instructions with a short displacement into other branch instructions if the target of the branch is out of reach. This seems similar to what you want achieve.

Thanks, I think I was driven off by this comment:

// Expand PseudoCALL(Reg), PseudoTAIL and PseudoJump to AUIPC and JALR with
// relocation types. We expand those pseudo-instructions while encoding them,
// meaning AUIPC and JALR won't go through RISCV MC to MC compressed
// instruction transformation. This is acceptable because AUIPC has no 16-bit
// form and C_JALR has no immediate operand field.  We let linker relaxation
// deal with it. When linker relaxation is enabled, AUIPC and JALR have a
// chance to relax to JAL.
// If the C extension is enabled, JAL has a chance relax to C_JAL.

I’ll take a loot at the proposed implementation, thanks!

Unfortunately, I need a link-time solution since the target addresses are known at link-time and not compile-time. But thanks anyway

1 Like