Hi All,

I’ve just committed some changes to the PBQP allocator which are designed to make it easier to implement custom register allocation constraints. This is a quick summary of those changes, and of the status of the PBQP allocator in general.

First a quick bit of background:

The PBQP allocator is based on ideas described in [1]. I implemented this algorithm (with the improved heuristic from [2]) in LLVM, and committed it to the mainline back in October 2008. Since then it has consistently passed correctness tests (compiling all benchmarks for which linear scan also succeeds, including the SPEC benchmarks), and produces similar code quality to linear scan on average (on my AMD64 test box at least). It currently has significant compile-time costs, requiring more memory and roughly triple the total compile time on average (compared to linear scan). Hopefully these will come down somewhat if/when I find more time to devote to optimizing the allocator, however I believe they will remain on the high end. The reason for this (and at the same time the source of PBQP’s strengths) is the generality of the mathematical optimization problem which underlies the allocator. This optimization problem allows a wide variety of register allocation costs and constraints to be expressed in a straightforward manner. If you have a target architecture with complex constraints and you want to get start getting good allocations with minimal engineering work, this may be the allocator for you.

Here’s how the PBQP allocator works:

(Note: This section dives in to a few details of how PBQP works, and how to use it, so you can skip it if you’re not interested)

Starting with some pseudo-C+±code for the allocator, and taking a very high-level view:

void pbqpAllocate() {

do {

problem = builder->buildPBQPProblem(machineFunction);

solution = solvePBQP(problem);

allocation = mapToAllocation(solution);

} while (containsSpills(allocation));

}

Essentially the allocator loops mapping the current machine function to an optimization problem, solving the problem, them mapping the solution back to an allocation. If there are any spills then they’re applied (using LLVM’s spilling infrastructure), and iteration terminates when there are no further spills required.

It’s a very similar process to graph coloring, only instead of building an interference graph and solving a k-colouring problem, you’re building a PBQP instance and minimizing an objective function. The good news is that there’s a PBQP solver implemented in LLVM, and the mapping from solutions to allocations is generic, so the only thing that needs to be done to support a new architecture is to supply a mapping from machine functions to PBQP instances for the new target. If you can do that, the existing infrastructure can handle the rest. (A caveat is appropriate here: PBQP is *only* solving the labeling problem. That’s powerful, but it doesn’t solve every problem in regalloc. If you have issues with the spiller/rewriter, or need high-quality bank-selection, or something like that, I’m afraid this won’t help).

So how do you write mappings?

First you need to know what a PBQP representation of a register allocation problem looks like. They’re represented as graphs, and those who know graph coloring will find the structure familiar. Each virtual register in the machine function is represented by a node in the PBQP graph, and constraints between virtual registers (such as interference) are represented as edges. in the PBQP graph each node is associated with a cost vector, which gives the cost of each allocation option for the virtual register. E.g.

spill 500 // ← Spill cost estimate

ax 0

cx 0

si 0

Typically you will use the spill cost estimate for the spill option, and zero for each register, however you can attach positive or negative costs to the individual registers to preference them as you like (which is handy if you’ve got preferred registers with shorter instruction encodings for instance).

Each edge in the PBQP graph is associated with a cost matrix. This gives the cost for combinations of allocations for the virtual registers it connects. So to represent an interference constraint you would have a matrix with infinite costs for aliasing physical registers, and zero everywhere else. Here a picture helps:

spill al ah cl dl

spill 0 0 0 0 0

ax 0 inf inf 0 0

cx 0 0 0 inf 0

si 0 0 0 0 0

If you want a coalescing edge you’d use a matrix with negative costs for elements that allocate virtual registers to the same physical register. To represent a pairing constraint you’d put an infinite cost on any pair whose allocation doesn’t obey the pairing restriction (See [1] for several example constraints).

The overall cost of a specific solution (i.e. a register assignment) is given by summing the costs over all nodes and edges in the graph. Finding an optimal solution is NP-complete (since you can map k-colouring problems to PBQP), however the solver should find you a decent solution relatively quickly.

There’s a lot more detail on all of this in [1] and [2], so I’ll leave you to read them for more information if you’re interested (or feel free to shoot me questions via email). For now I’ll get on to the the how-to bit for LLVM.

The new header include/llvm/CodeGen/RegAllocPBQP.h includes a class called PBQPBuilder. This class represents the mapping between machine functions and PBQP problems. By extending it and implementing the build method you can add your own register allocation constraints to the PBQP problem. Just call back to the PBQPBuilder base to have the spill costs and interference constraints added for you, then iterate over the instructions in the MachineFunction and add any additional constraints for your architecture. Finally you’ll need to add your own “createRegisterAllocator” function to the passmanager in your Target, and have this function construct a RegAllocPBQP (PBQP allocator) instance with your Builder. That should do it*.

Summing up:

The PBQP allocator remains solid, generates reasonable code, and has just been generalized to allow you to add custom register allocation constraints. People who want a (comparatively) easy way to get an allocator running for architectures with unusual constraints may find it useful.

Hopefully the above can serve as a starting point for anyone who’s interested to play around with the new features, and as always I’m very happy to hear questions and comments.

Regards,

Lang.

- This claim is only partially tested, but I’m running the allocator right now with the base constraints plus an extension which adds coalescing costs and it’s just passed CINT2006 without error. Not bad given that the coalescing extension only took a couple of hours to write.

References:

[1] Scholz, B., Eckstein, E. 2002. Register allocation for irregular architectures. In Proceedings of the Joint Conference on Languages,

Compilers and Tools for Embedded Systems (LCTES’02), ACM Press, New York, NY, USA, 139-148.

[2] Hames, L. and Scholz, B. 2006. Nearly optimal register allocation with PBQP. In Proceedings of the 7th Joint Modular Languages Conference

(JMLC’06). LNCS, vol. 4228. Springer, New York, NY, USA. 346-361.