Finding caller-saved registers at a function call site

Hi everyone,

I’m looking for a way to get all the caller-saved registers (both the register and the stack slot at which it was saved) for a given function call site in the backend. What’s the best way to grab this information? Is it possible to get this information if I have the MachineInstr of the function call? I’m currently targeting the AArch64 & X86 backends.

Thanks!

Hi Rob,

> I'm looking for a way to get all the caller-saved registers (both the
> register and the stack slot at which it was saved) for a given function
> call site in the backend. What's the best way to grab this
> information? Is it possible to get this information if I have the
> MachineInstr of the function call? I'm currently targeting the AArch64
> & X86 backends.

You should be able to use the RegMask operand to the MachineInstr to
discover the registers that are preserved or clobbered by the call
according to the calling convention. For reference, you might want to
look at `getRegMask` and `gatherMaximalPreservedRegisters` in
http://reviews.llvm.org/D21115.

As far as discovering the slot to which it is spilled, I have no idea.
CC'ing Matthias for this.

-- Sanjoy

Just to be sure: You are not talking about callee saved registers (the ones that are usually saved in the prologue and restored in the epilogue of a function)?

As Sanjoy already mentioned: Registers are marked as clobbered/preserved with a RegMask operand on the call instruction. Often some values that are live accross a function get spilled because we have no callee saved (preserved) register left for them. The question of what values are in caller save registers is therefore an odd one: Of course there are no values live in caller save registers at the call site because that would be invalid. We have spill slots for certain values (all those that live across a call but didn't make it into a callee saved register) but there is no notion of a spill slot for a caller saved register %EDX for example.

- Matthias

Hi Sanjoy,

I’m having trouble finding caller-saved registers using the RegMask operand you’ve mentioned. As an example, I’ve got a C function that looks like this:

double recurse(int depth, double val)
{

if(depth < max_depth) return recurse(depth + 1, val * 1.2) + val;

else return outer_func(val);

}

As a quick refresher, all “xmm” registers are considered caller-saved on x86, hence values stored in these registers should be spilled to the stack before a function call. The generated assembly for branch containing the call to “recurse” with clang/LLVM 3.8 (-O3) on Ubuntu 14.04 looks like this:


400694: ff c7 inc %edi # Add 1 to depth
400696: f2 0f 10 05 a2 92 05 movsd 0x592a2(%rip),%xmm0 # Move constant 1.2 into xmm0
40069d: 00
40069e: f2 0f 59 c1 mulsd %xmm1,%xmm0 # val * 1.2
4006a2: f2 0f 11 4d f8 movsd %xmm1,-0x8(%rbp) # Spill val to the stack
4006a7: e8 d4 ff ff ff callq 400680
4006ac: f2 0f 58 45 f8 addsd -0x8(%rbp),%xmm0 # recurse’s return value + val
4006b1: 48 83 c4 10 add $0x10,%rsp
4006b5: 5d pop %rbp
4006b6: c3 retq

Notice how xmm1 (the storage location of “val”, which is live across the call to recurse) is saved onto the stack at an offset of -8 from the base pointer. After the call, “val” (i.e., storage location rbp - 0x8) is used in the addition to calculate the returned value. However, when I print the RegMask operand for the call machine instruction, I get the following:

<regmask %BH %BL %BP %BPL %BX %EBP %EBX %RBP %RBX %R12 %R13 %R14 %R15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W %R15W>

I don’t see xmm1 as being preserved across this call. Am I missing something? Thanks for your help!

Hi Rob,

Rob Lyerly wrote:
[snip]

> following:
>

> <regmask %BH %BL %BP %BPL %BX %EBP %EBX %RBP %RBX %R12 %R13 %R14
> %R15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W
> %R15W>

IIUC the regmask lists the registers that are preserved *by* the
callee. The registers that would have to be preserved by the caller
are the complement of this set.

It'll be helpful if you give us a high level picture of the problem
you're trying to solve.

Thanks,
-- Sanjoy

Ah, I see – the registers left out of the mask are considered clobbered. Got it!

At a high level, I’m interested in finding the locations of all values that are live at a given call site. You can think of it like a debugger, e.g. gdb – I’d like to be able to unwind the stack, frame by frame, and locate all the live values for each function invocation (i.e., where they are in a function’s stack frame/register set) at the function call site. I’m currently using DWARF metadata to implement the frame unwinding mechanism, similarly to gdb. I can’t use DWARF metadata for live value location information, however, because it only generates location information for named source code variables. I also need to locate compiler-generated temporaries, hence I’ve been looking at the StackMap intrinsic [1] to provide live value location information. It does most of what I need, but it does not tell me where live values stored in registers are spilled to the stack as part of the function call procedure (whether they be in callee- or caller-saved registers) – it simply tells me which registers they are stored in before/after the function call procedure. That’s the impetus for my question.

This is not a problem for callee-saved registers – these registers are restored from the stack as part of the call frame unwinding procedure detailed in the DWARF standard [2]. However, I’m left trying to find the locations of the live values that were in caller-saved registers and were spilled to the stack as part of the function call procedure (probably during instruction selection/register allocation, I’m not familiar enough with this process). I realize that for a MachineInstr for a given call there are no live values in caller-saved registers (as they would be clobbered and lost), but where on the stack were they saved?

In a nutshell, I’m trying to figure out where values that couldn’t be placed in callee-saved registers (and that were allocated to caller-saved registers) were spilled to the stack as part of the function call procedure. Hopefully this clarifies things – thanks!

[1] http://llvm.org/docs/StackMaps.html
[2] http://dwarfstd.org/doc/DWARF4.pdf, page 140

Hi Rob,

Robert Lyerly wrote:
> At a high level, I'm interested in finding the locations of all values
> that are live at a given call site.**You can think of it like a
> debugger, e.g. gdb -- I'd like to be able to unwind the stack, frame by
> frame, and locate all the live values for each function invocation
> (i.e., where they are in a function's stack frame/register set) at the
> function call site. I'm currently using DWARF metadata to implement the
> frame unwinding mechanism, similarly to gdb. I can't use DWARF metadata
> for live value location information, however, because it only generates
> location information for named source code variables. I also need to

Isn't DWARF info best effort (not a rhetorical question -- I don't
actually know)? Or do you not care about being a 100% precise?

Given that you're interested in finding all values live at a
call-site, why not do just that -- run a liveness analysis over stack
slots and registers? That should catch compiler temporaries too.

A related question is: are you interested in the *values* or the
*locations* the values are in? For instance if a specific value (say
the result of a load) is spilled at 0x80(%rsp) and is also present in
%r13 (callee saved register), then do you have to know both the
locations or just one of the two?

> locate compiler-generated temporaries, hence I've been looking at the
> StackMap intrinsic [1] to provide live value location information. It
> does most of what I need, but it does not tell me where live values
> stored in registers are spilled to the stack as part of the function
> call procedure (whether they be in callee- or caller-saved registers) --
> it simply tells me which registers they are stored in before/after the
> function call procedure. That's the impetus for my question.

With stackmaps, is the problem that it tells you e.g. a live value is
present in %r9 (caller saved register), but when unwinding the value
may have been clobbered? This is something other people have run into
as well -- specifically the distinction between "live on call"
(available just before the call) vs. "live on return" (available after
the callee returns). I'm hazy on the details, but IIRC if this is a
problem, then you may have problems bigger than just figuring out the
spill slots, since the caller saved register may not actually have
been spilled anywhere (since it does not need to live across the
call).

-- Sanjoy

Hi Sanjoy,

I think I understand where the spill code is getting generated. I’ve been digging through the register allocation debug information, and I see that the register allocator itself is generating the spill code around the call site (the greedy allocator is also splitting the virtual register’s live range around the call site). I don’t care about the internals of the register allocator, but I see that it produces a VirtRegMap which contains the mapping of virtual registers to physical registers/spill slots. I have two questions about this:

  1. Is there a way to access the produced VirtRegMap in the architecture-specific AsmPrinter? I tried the normal getAnalysis, but it produces and empty mapping.

  2. If I manage to get access to this mapping, is there a way to correlate an LLVM bitcode Value to a virtual register?

Hi Rob,

Robert Lyerly wrote:
> The reason I can't just run a liveness analysis over stack slots and
> registers in the backend is that I'm trying to map live value locations
> back up into their corresponding values in LLVM bitcode. This is why
> I'm using the stackmap intrinsic, as it does exactly that -- provides a
> mapping between a bitcode value and its storage location for the
> generated assembly. I need this intermediate-level value because I'm
> doing ABI translation. I'm plucking values out of a call frame laid out
> in one ABI and storing them in a destination stack frame that is laid
> out according to another ABI. The IR value is essentially the "key"
> used to match corresponding storage locations across the two ABIs. I'm
> transforming a thread's current stack laid out for one ABI into one laid
> out for another ABI.

This sounds exactly like the deoptimization[1] mechanism we use (and
LLVM has support for), except that when deoptimizing the code being
returned into is (and the associated frame layout) is generally
"fixed" i.e. is the interpreter or a low tier JIT.

> A related question is: are you interested in the *values* or the
> *locations* the values are in? For instance if a specific value (say
> the result of a load) is spilled at 0x80(%rsp) and is also present in
> %r13 (callee saved register), then do you have to know both the
> locations or just one of the two?
>
> I'm actually only interested in being able to find values; I don't
> particularly care about where they're stored. In your hypothetical, as
> long as the compiler could tell me that the value was stored in one of
> those locations, that'd be okay.

Again, this makes it very close to deoptimization.

> I'm not concerned about values that are not live across the call ("live
> on call"), only those that are live after returning from the call ("live
> on return"). If the value is not live after the call, there's no need
> for me to able to recover it. I just need to be able to resume
> execution in that function correctly, so I'm only concerned about values
> in caller-saved registers that are needed after the call completes, and
> therefore have been spilled to the stack as part of the procedure call
> standard.
>
> Because I'm rewriting the stack to change the ABI, I need to be able to
> set up the stack so that execution can correctly unwind back up the call
> chain. This means that I need to be able to populate spill stack slots
> for caller-saved registers, hence this is why I need their locations.

Ah, so the spill slots are not just pertinent from the POV of the
function you're translating out of, but is also pertinent for the
function you're translating *into*?

IOW, you want to translate

void foo_0() {
   spill %rax to offset 0x90
   call bar
   reload %rax from offset 0x90
   return %rax
}

to

void foo_1() {
   spill %rax to offset 0x100
   call bar
   reload %rax from offset 0x100
   return %rax
}

at the call site to bar, and want to know that the contents of 0x90
need to be copied to 0x100 if you rewrite the stack frame?

I'm not sure how much of your project you're okay in discussing on a
public mailing list, but I suspect the strategy for the best scheme
here will depend on how different the two functions are from each
other.

If all they differ is in the physical stack slot offsets, then I'd
just look at opaquely rewriting the slot offsets and not specifically
caring about live values.

If they differ at a fundamental level, then maybe you need something
like what we do for precise relocating GCs (see documentation on
gc.statepoint); otherwise, for instance, how do you know that a value
that is put in a caller-saved-register in one compilation is also in a
caller-saved-register (and not in a callee-saved-register or constant
folded or rematerialized away) in another compilation?

-- Sanjoy

[1]: Hölzle, Urs, Craig Chambers, and David Ungar. “Debugging
   optimized code with dynamic deoptimization.” ACM Sigplan
   Notices. Vol. 27. No. 7. ACM, 1992.