I’m writing a back-end for an architecture that supports multi-word loads. As a concrete example, “ldqw r0, [addr]” would load a quadword (4 words) into 4 registers starting with r0 (implicit writes to r1, r2, and r3).
First, is there any currently supported architecture that has anything like this? I suspect not. If not, I hope someone might help me figure out how to make this work, particularly with the cooperation of the register allocator? In particular, I need the register allocator to understand that there are multiple, contiguous register assignments, and that their locations are moreover dependent on the specified initial input register.
I thought about defining a set of special register classes to group contiguous registers for each load size (2, 4, and 8), but this doesn’t feel very satisfying. Is this the right approach? I’m wondering if it would work, and if so, if it would also still be efficient.