I'm working on a backend for a 64-bit RISC-like architecture; all registers are 64-bits. I have load/store granularity of 8/16/32/64 bits (pretty normal stuff), but also 128-bit load/stores (involving a pair of registers). Because all registers can be used in integer and float instructions, I only have a single register class (so far).
I'm trying to figure out a good way to utilize our 128-bit loads/stores - the data can be in any two 64-bit registers, but the memory locations must be strongly aligned and adjacent (aka first register to 0mod16 address, 2nd register to 8mod16 address). The stack is guaranteed to be 0mod16 aligned.
I was thinking of doing this coalescing in CogERgisterInfo::eliminateFrameIndex (Our target == 'CogE') but then spotted some code re: register scavenging which could make this problematic.
I also see that there is ARM/ARMLoadStoreOptimizer.cpp, which performs a similar coalescence.
The two questions then:
1) for those of you who are working on retargeting the backend, what approach would you recommend that I use for this kind of coalescence?
2) What are the pitfalls I am likely to encounter?
Thanks in advance,
AArch64 has almost this exact situation. The options are generally to try to form the load/store pair instructions early during initial instruction selection or to form them late by coalescing two loads of adjacent memory (alignment permitting) together. AArch64 takes the latter approach, as this allows coalescing of operations which are otherwise functionally unrelated (register spills and restores, for example).
A few catches are:
1) Volatile pointers. In many cases, that implies memory mapped I/O or something along those lines which often plays badly with the wider load/store instructions. Totally micro arch dependent, of course, but something to watch out for. It’s best to conservatively not combine loads or stores which involve volatile pointers.
2) Alignment problems. It’s not uncommon for these instructions to have stricter alignment requirements than normal load stores (e.g., often on ARM, unaligned load/store can be fine for some instructions but not others). These are typically bugs in the user’s source code, of course, but they’re tricky to track down. It’s worth giving thought to a) when it’s really worth it and b) how to ease debugging of the problem. AArch64 has developer-only command line options and statistics gathering to help sort out what’s going on, for example.
3) Memory operation ordering. It’s important to be careful about re-ordering of loads and stores, both with regards to each other and any other instructions that are memory barriers.
4) I’m sure there’s other subtleties I can’t remember. Comments in the ARM and AArch64 passes, and likely the commit history for each, should be helpful.