How to ensure a value is in a specific register at call sites


Is there a canonical way to ensure a value will be in a specific register prior to a call? I’d like the intervening code to be able to reuse the register, so I’d like to avoid reserving it. The value is valid for the lifetime of the frame, but can likely be recomputed efficiently. Since the value isn’t needed except at call sites, I’d like to ensure that the register allocator, and any other optimizations can leverage this information.

As an example, assume on X86 that I’d like to use R11 as part of this protocol, but I’d still like to use it as a free register between calls. Which register it is, doesn’t matter, and I picked R11 rather arbitrarily since its a scratch register, but in practice it may actually end up being a callee saved reg (still TBD). I’d also like to do this for targets other than X86, so I’m after a general method rather than any specific advice regarding X86.

In the snippet below, the prologue executes normally, but we additionally use R11 before computing its new value in the callee. After the callee returns, R11 can be freely used again. Ideally the optimizer would decide if it should be loaded, spilled, or recomputed prior to the call to foo(...) in BB2. The callee foo() may or may not participate in the protocol, but either way, we’d like R11 to hold the correct value at the call site.

  // Handle Callee saved regs, etc.
  // use passed in R11
  // compute new R11 value for current frame
  // ... R11 can be used as a free register until the next CALL instruction

  // Marshall arguments (omitted)
  // Rematerialize R11 for callee (may be recomputed, reloaded, or copied)
  // Call  foo(...) 
  // R11 is free again ...

I’m pretty sure there are many approaches that would work here, but this seems like a common enough task that I figured it would be best to just ask before running off and experimenting.

I did spend some time digging through CodeGen and Target but I haven’t found anything obvious yet. Most of my experience has been doing analysis and transforms on LLVM IR rather than modifying the backend, so I’m not familiar w/ many of the codegen and target specific transforms.

Any pointers here are very much appreciated.

What you describe really sounds like you’re modifying the ABI, or at least a calling convention, and looking into how those things work is where I would start.

Thanks for the suggestion. I initially checked some of the code around the calling convention and ABI specifics in CodeGen and Target when I first started investigating, but they didn’t seem to be a strong match to my use case. Maybe I just need to revisit those and give them a closer look, though.

This seems like it is something between a parameter register (which will presumably be caller-saved) and a callee-saved register from which the callee have some expectations. As such, it seems to be somewhat orthogonal to what an ABI’s calling convention requires.
This is somewhat similar to the so-called TOC pointer that we have on PowerPC. Its details are:

  • Functions within the same DSO share the TOC pointer so functions maintain it across local calls
  • Functions in different DSO’s can have different TOC pointers so a non-local call will be directed through the “global entry point” of the function which sets up its TOC
  • Functions make no attempt to save/restore the pointer to their caller’s value - just to the value they need during their own execution (i.e. if they’re making calls to functions for which it is not known whether it resides in the same DSO)

This scheme requires input from the linker (since the boundary is a DSO and not a compilation unit). But perhaps this is similar to what you’re after. I’m not sure.

I’m also reminded of the “static link” needed by languages with nested procedures, such as Pascal; this provides the callee with a pointer to the lexical parent’s stack frame. The languages LLVM was designed for don’t have nested procedures, so LLVM might not support that concept, not sure.

Here is a possible approach. As mentioned above, it means an ABI modification, as you are adding an implicit argument to each function.

In LowerFormalArguments(), you make the register available:

  const TargetRegisterClass *RC = ...;
  Register SpecialReg = MRI.createVirtualRegister(RC);
  MRI.addLiveIn(/* your register here*/X86::R11, SpecialReg );

In order to refer to this virtual register later, you can save it in your MachineFunctionInfo instance. Use this VC whenever you need the value passed to the function.

When a call is made, you need to load the new computed value. This happens in LowerCall(). All implementations use a SmallVector<std::pair, ??> RegsToPass to hold the registers passed to the called function. Just add your newly computed value into this vector, before the chain is constructed:

RegsToPass.push_back(std::make_pair(/* your register here */ X86::R11, /* new value */ ??));

As result the register has the computed value on function entry. For each call, the new computed value is loaded into the register. And between calls, in case the register needs to be used, the value is spilled and reloaded if necessary.