I’m debugging a regression on x86 and my current belief is it’s due to undef handling when combined with SUBREG_TO_REG. I do not know how to resolve the two points in the current definition for it:
/// SUBREG_TO_REG - Assert the value of bits in a super register.
/// The result of this instruction is the value of the second operand inserted
/// into the subregister specified by the third operand. All other bits are
/// assumed to be equal to the bits in the immediate integer constant in the
/// first operand. This instruction just communicates information; No code
/// should be generated.
/// This is typically used after an instruction where the write to a subregister
/// implicitly cleared the bits in the super registers.
HANDLE_TARGET_OPCODE(SUBREG_TO_REG)
All other bits are assumed to be equal to the bits in the immediate integer constant in the first operand.
This instruction just communicates information; No code should be generated.
In this failure, the original IR has an undef phi input. Prior to the coalescer, there is MIR that looks like:
The coalescer decides that SUBREG_TO_REG is merely a copy, and it’s deleted as these registers coalesce with another. Later during allocation, the resulting coalesced copy inside this block are deleted such that the use blocks see different undefs (and the entry IMPLICIT_DEF is disconnected and immediately killed).
If this is merely an assert that the high bits are 0, this is just broken. An explicit zeroing is needed. Either SUBREG_TO_REG requires preserving the high bits and cannot be coalesced with copies, or this pattern is broken and requires explicit zeroing.
MOV32rr is an explicit instruction, which zeros the high bits. On x86, every instruction that produces a 32-bit result zeros the high 32 bits of the corresponding 64-bit register. That’s why it’s explicitly lowering to MOV32rr, and not a COPY. (There are other equivalent instructions, but “mov” is the shortest/fastest way to write this.)
If some optimization is deciding that the MOV32rr can be removed, that optimization is wrong (or the x86 backend is giving it wrong information about the semantics of MOV32rr).
The “before coalescing” IR looks fine; the MOV32rr zeros the high bits, and the SUBREG_TO_REG indicates that we’re actually using the whole 64-bit register even though MOV32rr doesn’t explicitly mention the 64-bit register. This is how SUBREG_TO_REG is meant to be used; it indicates the operand register was produced by an instruction that actually defines more bits than the definition indicates.
(I won’t argue this is the best design if we were designing it from scratch, but it’s what we’ve done for a long time.)
I think the real fix is to stop special casing this and use a tied operand with a subregister index, instead of this separate subregister index operand. As-is I think you need to special case teach that there’s a read of this unrelated register ~everywhere
I’m approaching a solution which is to have SUBREG_TO_REG add an implicit-def of the full virtual whenever it coalesces with something else. The later simple is-copy recognition then needs to not consider copy/move with an implicit def as a simple copy