Interprocedural Register Allocation

Hello everyone,
I have been interested in compilers, especially in the optimization aspects from
quite some time now. I don’t know if I have a decent background to brag about, but
all I can say is that I have tried very small things related to building a compiler [0].
However I admit that I am still new to a lot of things.

Like many others who are interested in compilers, I am interested in the Register
Allocation problem too. I have been thinking about implementing an interprocedural
register allocator from quite a while now. But I thought instead of trying it out on a
toy compiler, why not just get my hands dirty in the real world. I was seeing if I
could implement it on LLVM and incidentally, I noticed that the open projects page
lists interprocedural register allocation as one of the open projects [1].

However, I was reading the DeveloperPolicy page and the policy for making major
changes asks the developers to discuss the work here before proceeding. So, I am
writing this mail to kickoff a discussion. I would really like to contribute to LLVM and
I think this is a good place for me to start. Is there something specific like a paper
that you guys would want me to read before diving in?

I understand that register allocation itself is a tricky problem and doing an interprocedural
allocation is extremely hard. But I would like to try, at least try and fail if worst comes
to worst than not doing anything at all. At least by attempting to implement that, I will
have a better understanding of LLVM code base which may come in handy to contribute
to other parts of LLVM.

[0] https://github.com/madhusudancs/PL241-MCS-Compiler
[1] http://llvm.org/OpenProjects.html#codegen

Interprocedural register allocation means different things to different people. The approach that is described on the open projects page is quite simple; it still runs the register allocator on one function at a time.

I am not aware of any papers describing this simple idea.

Basically, the PrologEpilogInsertion pass will add a bit mask to MachineModuleInfo describing which registers are clobbered by the function being compiled. Later, when compiling the callers, that bit mask is used to initialize the regmask operands on call instructions.

See also TargetRegisterInfo::getCallPreservedMask().

/jakob

Hi Jakob,

However, I was reading the DeveloperPolicy page and the policy for making major
changes asks the developers to discuss the work here before proceeding. So, I am
writing this mail to kickoff a discussion. I would really like to contribute to LLVM and
I think this is a good place for me to start. Is there something specific like a paper
that you guys would want me to read before diving in?

I understand that register allocation itself is a tricky problem and doing an interprocedural
allocation is extremely hard. But I would like to try, at least try and fail if worst comes
to worst than not doing anything at all. At least by attempting to implement that, I will
have a better understanding of LLVM code base which may come in handy to contribute
to other parts of LLVM.

Interprocedural register allocation means different things to different people. The approach that is described on the open projects page is quite simple; it still runs the register allocator on one function at a time.

I am not aware of any papers describing this simple idea.

Basically, the PrologEpilogInsertion pass will add a bit mask to MachineModuleInfo describing which registers are clobbered by the function being compiled. Later, when compiling the callers, that bit mask is used to initialize the regmask operands on call instructions.

So the idea is to sidestep from the calling convention a bit if we
already know that the called function will not be using all the
registers required by the convention and instead use those registers
in the caller?

If I am understanding this correctly, is this something desirable for
LLVM, even if it is not high priority? Would you guys be Ok if I try
to implement this without disturbing the project priorities but with
a little help/guidance? Something that interests me to get started
with LLVM.

Basically, the PrologEpilogInsertion pass will add a bit mask to MachineModuleInfo describing which registers are clobbered by the function being compiled. Later, when compiling the callers, that bit mask is used to initialize the regmask operands on call instructions.

So the idea is to sidestep from the calling convention a bit if we
already know that the called function will not be using all the
registers required by the convention and instead use those registers
in the caller?

That’s right.

If I am understanding this correctly, is this something desirable for
LLVM, even if it is not high priority?

Yes.

Would you guys be Ok if I try
to implement this without disturbing the project priorities but with
a little help/guidance?

Absolutely.

/jakob

Hi Jakob,

Basically, the PrologEpilogInsertion pass will add a bit mask to MachineModuleInfo describing which registers are clobbered by the function being compiled. Later, when compiling the callers, that bit mask is used to initialize the regmask operands on call instructions.

So the idea is to sidestep from the calling convention a bit if we
already know that the called function will not be using all the
registers required by the convention and instead use those registers
in the caller?

That’s right.

If I am understanding this correctly, is this something desirable for
LLVM, even if it is not high priority?

Yes.

Would you guys be Ok if I try
to implement this without disturbing the project priorities but with
a little help/guidance?

Absolutely.

Great! Thank you very much. Then, I will do some homework on how
I plan to implement this, a very rough sketch if not the actual
design, and come back.

Btw, I just found this http://optimisticcompilers.blogspot.com/ So
I will just take a look at what happened to that project.

Hi Jakob,
I have spent last 4 weeks trying to figure out how to implement
Interprocedural Register Allocation. I must admit that I was really
overwhelmed with LLVM’s codebase while trying to figure this out :slight_smile:
There is so much to know! I think I have reached a point where I
have some sort of basic understanding of what needs to be done,
but I need some help from here on. So here is the summary of
what I know and for what I need more help.

I see that lib/CodeGen/Passes.cpp adds PrologEpilogInserter (PEI
henceforth) pass after the RegAlloc pass. Do I understand correctly
that LLVM first performs a given pass on all the functions before
moving to the next pass? In order to proceed further with this analysis
and to reduce the number of email exchanges back and forth, I will
assume that my understanding is correct for now (branch prediction in
real life? :P). Please let me know if I am wrong, I will restart the
analysis from this point in the subsequent emails. Assuming this is the
case, as with many things compiler optimizations, this is somewhat
opposite to what we want :slight_smile: So let me start from the beginning of what
we want.

Let us say we want to allocate registers for node X in the call graph.
We want our interprocedural register allocator to have run on all the
child nodes of X in the call graph before it starts the register
allocation on the given node X itself. We not only need this, but
we also want all Callee Saved Registers (CSR henceforth)
information to be available for all the child nodes of X.

This is because we want to compute the physical registers available
for node X as:
set(all the physical registers available) - Union over all the
children(X) (set(CSR for the child node)) - Union over all the
children(X) (set(used physical registers including calling
convention registers for the child node))

However in the current implementation CSR info is calculated in the PEI
pass which is done after register allocation. I am guessing that this
was because, CSR spill insertion is done in the Prolog and the Epilog
in the case when we do not shrink wrap and there was no need to
calculate CSR elsewhere and this was the reasonable location to place
that calculation. However, we have a need to have the calculation done
in the register allocator now. So IMHO, we should be moving the
calculation of CSR out of PEI pass but should save that info to be made
available to PEI pass for spill code insertion. If we can actually do
this, we should be able to combine this CSR calculation with register
allocation. In fact, unless it is absolutely necessary to keep them
separate for modularity or other such reasons, we can merge the PEI pass
and the RegAlloc pass together. What are the cases where one uses a
RegAlloc pass but not PEI pass?

Another concern is about the re-use of CallGraphSCC pass. I see that in
lib/Analysis/IPA/CallGraphSCCPass.cpp’s CGPassManager::RunPassOnSCC()
method there is this line, Changed |= FPP->runOnFunction(F); and FPP
is cast to (FPPassManager
). However, if we want to re-use
CallGraphSCCPass in CodeGen, shouldn’t we be casting FPP to the
MachineFunction equivalent and be calling the runOnMachineFunction()
method instead? So should we inherit CallGraphSCCPass to create
MachineCallGraphSCCPass like what we do for FunctionPass, and replace
all the casts/references to non-machine passes to equivalent machine
passes in the overridden methods?

So given all these things, here is my plan on how to go about
implementing an Interprocedural Register Allocator (IPRA henceforth).

  1. Decide if we have to create MachineCallGraphSCCPass.

  2. If we have to create MachineCallGraphSCCPass, inherit from
    CallGraphSCCPass and override the methods as required.

  3. Create a new IPRA pass which inherits from MachineCallGraphSCCPass
    or CallGraphPass depending on whether we create
    MachineCallGraphSCCPass.

  4. Implement a loop in runOnMachineSCC that iterates over the CallGraph
    nodes in the bottom-up order:
    A. For each node, run one of the allocators as chosen by the command
    line flag or the optimization level required. This can be
    implemented as an argument to createIPRAPass() method.
    B. While running the register allocator for a node, make use of the
    RegMask from the child nodes to compute the registers available
    for this node.
    C. Immediately after the register allocator is run for the node, run
    calculateCalleeSavedRegisters() on that node and save that info.
    I think the current method already saves this info in MFI, so we
    do not have to do any extra work to save it for later use.
    D. Update the RegMask for the node so that its parents know what
    registers to use.

  5. Since calculateCalleeSavedRegisters() is now part of register
    allocator, remove it from PEI pass but use that info where required.

  6. Add a command line flag to enable IPRA (turned off by default) and
    add the pass to passes.cpp and the non-IPRAs work exactly as they
    are now from the outside, but calculateCalleeSavedRegisters() is
    moved to a different place.

Other things like, not using the RegMask of the child nodes which have
calls to external functions or have function pointers as function
parameters should be taken care of.

Please let me know if this makes any sense at all. Also if you would
like to have a quicker turn around time for a conversation, I can login
to LLVM’s IRC channel when you have some time or please feel free
to ping me on IM (I use this same email ID for GTalk).

No, it's the other way around.

You may want to read up on the pass manager and play with the -debug-pass option.

/jakob

Hi Jakob,

I have been trying to learn how the CodeGen passes work and I am playing around with the -debug-pass option. I tried implementing a bare CallGraphSCCPass based Pass in the CodeGen which basically does nothing for now. I mostly tried to replicate what RegAlloc passes do. I did this instead of modifying the existing RegAlloc passes to use CallGraphSCCPass because that was becoming way too complicated for simple experiments I was trying when I tried to do that. I am attaching the diffs of what I actually tried to do for reference here. It is not really well documented and there is a lot of copy/paste code there, but well, this is not really the final code to be submitted or anything, just to play around, so I have not really put efforts into comments etc. yet.

So when I build LLVM and try to run the pass with the following command:
$ llc --debug -cgregalloc=cg <$HOME/cpptry/manyfuncs.bc

I get the following error.

Args: llc --debug -cgregalloc=cg
Subtarget features: SSELevel 6, 3DNowLevel 0, 64bit 1
Pass ID not registered
UNREACHABLE executed at /media/python/workspace/llvm/
lib/CodeGen/Passes.cpp:324!
0 llc 0x0000000001453dfe
1 llc 0x00000000014542fa
2 libpthread.so.0 0x00007f020b750cb0
3 libc.so.6 0x00007f020a99f425 gsignal + 53
4 libc.so.6 0x00007f020a9a2b8b abort + 379
5 llc 0x000000000143b1a6
6 llc 0x0000000000f3cb02 llvm::TargetPassConfig::addPass(void const*) + 146
7 llc 0x0000000000f3ddef llvm::TargetPassConfig::addCGRegAlloc(llvm::CallGraphSCCPass*) + 47
8 llc 0x0000000000f3d7df llvm::TargetPassConfig::addMachinePasses() + 719
9 llc 0x0000000000e5a3af
10 llc 0x0000000000e5994c llvm::LLVMTargetMachine::addPassesToEmitFile(llvm::PassManagerBase&, llvm::formatted_raw_ostream&, llvm::TargetMachine::CodeGenFileType, bool, void const*, void const*) + 92
11 llc 0x00000000005f6002 main + 4946
12 libc.so.6 0x00007f020a98a76d __libc_start_main + 237
13 llc 0x00000000005f4bd1
Stack dump:
0. Program arguments: llc --debug -cgregalloc=cg
Aborted (core dumped)

I am not really sure, why it says “Pass ID is not registered”. I did quite a lot of internet searching about it and whatever I try to look for I end up on the exact LLVM line of code which prints this error line. So, I am not sure what is going on. Is this because CallGraphSCCPass does not register anything with the MachineFunctionPass-es? Or is this because CallGraphSCCPass does not do some magic which is specific to CodeGen passes? Can you please help me with this a bit? May be some pointers?

0001-CGTry-pass.patch (10.5 KB)

manyfuncs.bc (1.14 KB)

Hi,

Can some one please help me with this problem? From what I understand:

const PassInfo *PassRegistry::getPassInfo(const void *TI) const in lib/VMCore/PassRegistry.cpp

method returns 0 when it is expected to return a pass info. I am not sure what PassInfo is expected here. I tried to put a few DEBUG statements within that method but I could not find the information needed. So some pass is not being registered.

Can some one please help me?