Avoiding MCRegAliasIterator with register units

LLVM can model some quite complicated register banks now, and we even use registers to model some encoding constraints.

For example, a few ARM instructions like strexd have two register operands that must be an aligned pair of consecutive GPR registers (like r0, r1). This constraint is modeled with the GPRPair register class containing R0_R1, R2_R3, ... pseudo-registers.

Sometimes ISAs also assign assembly names to such pseudo-registers, again from ARM:

SPR: (s0, s1, ...) 32-bit floating point registers.
DPR: (d0, d1, ...) Even-odd pairs of consecutive S-registers.
QPR: (q0, q1, ...) Even-odd pairs of consecutive D-registers.

But not all constraints are given 'register' names by the ISA. One vld1 instruction variant can load two consecutive D-registers, both even-odd and odd-even pairs. An even-odd pair like {d0, d1} is also called q0, but an odd-even pair like {d1, d2} has no other ISA name.

Since it's a bit random what an ISA decides to call a register and what it decides to call an encoding constraint, LLVM's concept of a physical register is usually a bit broader, but more consistent. We will normally define physical registers for all reasonable encoding constraints on register operands. For example, the LLVM ARM target has physical registers like D1_D2 which don't exist in the ISA.

In a target with many encoding constraints like that, some registers can have a high number of super-registers, and even more aliases. On the last count, some of the ARM NEON registers had more than 40 aliasing registers.

The register allocator uses register units to deal with the complexity of these register banks. Register units are more or less the same as leaf registers in the sub-register graph. Register interference is tracked in terms of register units, and that means it is no longer necessary to scan through the long list of aliasing registers.

We still have an MCRegAliasIterator that's used here and there in the code generator, and TableGen is still emitting tables backing this iterator.

I would like to minimize the use of MCRegAliasIterator because some of the alias lists are so long. In most cases, it should be possible to express algorithms in terms of register units instead.

I also want to avoid emitting tables for driving MCRegAliasIterator. If required, the set of aliasing registers can be computed from regunits and super-registers. (See the block comment for MCRegUnitRootIterator or LiveIntervals::computeRegUnitInterval).

/jakob

Jakob,
I've implemented a patch that reworks the MCRegAliasIterator to dynamically compute the register aliases. The size reduction in the RegDiffLists are rather dramatic.

Here are a few size differences for MCTargetDesc.o files (before and after) in bytes:
R600 - 36160B - 11184B - 69% reduction
ARM - 28480B - 8368B - 71% reduction
Mips - 816B - 576B - 29% reduction

One side effect of dynamically computing the aliases is that the iterator does not guarantee that the entries are ordered or that duplicates have been removed.
The documentation seems to imply this is a safe assumption and I haven't found a client that requires these attributes (i.e., strict ordering and uniqueness).

I also setup a LNT tester and found no execution-time failures on x86 for -O0 and -O2 using the llvm test-suite and externals test-suite (i.e., SPEC95/2K/2K6, etc.).
I'm still in the process of verifying there are no compile-time regressions.

Chad

One side effect of dynamically computing the aliases is that the iterator
does not guarantee that the entries are ordered or that duplicates have
been removed.

Hi Chad,

Sounds like you're growing the list (thus the lookup time), rather than
shrinking, as I take it was Jacob's original intention?

The documentation seems to imply this is a safe assumption and I haven't

found a client that requires these attributes (i.e., strict ordering and
uniqueness).

I don't think it should matter for the purposes of aliasing.

I also setup a LNT tester and found no execution-time failures on x86 for

-O0 and -O2 using the llvm test-suite and externals test-suite (i.e.,
SPEC95/2K/2K6, etc.).

I'm wondering why not O3, too? That'll expose vector selection as well, not
just VFP, which is a big source of super-registers, at least on ARM.

Somehow it sounds as though you'll find more performance regressions than
improvements, but that's my poor assumptions based on nothing concrete. :wink:

cheers,
--renato

Jakob suggestion was to remove the static tables, thus shrinking the object file size. Without the static tables, however, we do need to compute the aliases dynamically.
We’re intentionally making a tradeoff between compile-time and object file size with the hope that the overhead of dynamically computing the aliases is minimal. I
should also note that I’m not actually building/growing any list (i.e., no malloc/free), but rather iterating over the other static tables (i.e., MCRegUnits, MCRegUnitRoots,
and MCSuperRegs). I’ll attach the patch for the curious…

One would think, but you never know.

I’ll throw -O3 into the mix… :slight_smile:

Jakob also suggested minimizing the use of the MCRegAliasIterator because most of the clients can be rewritten in terms of other iterators. If necessary, I’ll
leave it to Jakob to explain this point further.

Chad

MCRegAliasIterator.patch (10.3 KB)

My thinking is that we should try to avoid using MCRegAliasIterator, and use MCRegUnitIterator instead.

That way, it is less of a problem to make MCRegAliasIterator a bit slower in order to save table space.

Thanks,
/jakob