Register coalescer and reg_sequence (virtual super-regs)

The register coalescer treats virtual super register classes – a sequential register range composed of multiple hardware registers – as a register with sub registers. When making coalescing decisions it thinks that the virtual super reg interferes with sub reg instances, even though in reality they shouldn’t conflict. That is, they are individual registers and would be better compared as such for register coalescing decisions (CoalescerPair::Partial = 0).

For example, I have a super reg that has r20, r21, r22, and r23 physical registers. This super reg is the dest of a reg_sequence which generates 4 COPY MIs. The first COPY coalesces (merging into r20), but the vregs for r21-r23 (SUPER_RC:%vreg50:subreg1…subreg3) are never coalesced after that because doing so generates inteference on %vreg50, the “parent” super reg.

Is there a way to work around this? It causes unnecessary copies.

Thanks,
Joe

Is this happening on trunk, or are you using an old version of LLVM?

/jakob

I think the last time I pulled from trunk was probably end of last year. Some time ago. Does your reply intimate it’s fixed on trunk? That would be great. (I don’t sync too often to avoid churn with my TD.)

Joe

Yes, it’s been fixed recently.

/jakob

Was it the subreg lane masks / mapping that was added to address the missed coalescing? This solution is nice, but I don’t think it’ll work for me. I have 8-element vector registers that can be grouped into virtual super regs for bulk save/restore, and as soon as I have more than 4 in a tuple, the unsigned int used to hold the lane masks overflows and switches over to the “bit 31 set == lanes unresolvable” mode, and coalescing fails.

What about moving the lane masks to a BitVector, that wouldn’t need to be constrained artificially? Too much of a performance impact going that way?

I’d be open to any thoughts/suggestions. I studied the ARM s_sub/d_sub/q_sub structure but that fits within the 32 bit lane mask. I also thought that LDM/STM would be similar, but the registers are physically enumerated, which is different from these virtual super reg frames I’m trying to construct.

Thanks,
Joe

Was it the subreg lane masks / mapping that was added to address the missed coalescing?

Yes, and the TRI::getCommonSuperRegClass() function.

This solution is nice, but I don't think it'll work for me. I have 8-element vector registers that can be grouped into virtual super regs for bulk save/restore, and as soon as I have more than 4 in a tuple, the unsigned int used to hold the lane masks overflows and switches over to the "bit 31 set == lanes unresolvable" mode, and coalescing fails.

What about moving the lane masks to a BitVector, that wouldn't need to be constrained artificially? Too much of a performance impact going that way?

Yes, in particular that would impose a cost on all the targets that don’t need this feature.

I didn’t expect any targets to need more than 31 bits for lane masks. Usually, ARM and x86 together span the envelope of insanity.

We can bump it to 64 bits if you like. It should be done with an MCLaneMask typedef à la MCPhysReg.

I'd be open to any thoughts/suggestions. I studied the ARM s_sub/d_sub/q_sub structure but that fits within the 32 bit lane mask. I also thought that LDM/STM would be similar, but the registers are physically enumerated, which is different from these virtual super reg frames I'm trying to construct.

Yes, ldm/stm is too complex to model in the register allocator, so they are handled by a post pass. You may need to do something similar.

It’s also worth noting that RAGreedy is not a full-blown 2D register allocator. It doesn’t track liveness per lane, so if the individual lanes in a virtual register have significantly different liveness, registers can go to waste.

I am thinking about fixing this by having a bimodal liveness representation. Most virtual registers are represented by a single LiveInterval, but when needed a vector could be switched to a representation where each lane has its own LiveInterval.

It’s a nontrivial project, but it would make the lane masks unnecessary, and it would take care of the wasted registers.

/jakob