The COPY operation for some targets is more than just a copy between two registers. There are some predicated dependencies associated with copies of certain register operands.
Such dependencies are currently marked with an implicit operand that sometimes limits targets while lowering them into target instructions.
Especially, when the copies are inserted during generic transformations (like regalloc pipeline), adding such target constraints becomes difficult if not correctly modeled.
I ran into a situation recently while implementing a feature for AMDGPU that made us think of the need for a Predicated COPY opcode.
For all vector register copies, AMDGPU target has a predicated dependency on a special hardware register called execution mask (exec).
The vector registers have multiple lanes (64, for example) and the execution mask represents the active lanes at a given point.
With the divergent control flow, GPUs dynamically turn on/off the lanes at various points, and for certain operands, we need to force enable all lanes to perform whole-wave vector operations including the copy.
That said, the vector copies either become wave-copy (all lanes enabled) or lane-copy (only a subset of lanes enabled).
Converting certain copies inserted during live range split in the regalloc pipeline into wave-copies turned out to be difficult.
The introduction of the predicated copy helped us model them better and later correctly lower them into target instructions.
I posted a basic implementation of the generic predicated copy with âš™ D143754 [MachineInstr] Introduce generic predicated copy opcode
To enable Predicated copy for AMDGPU âš™ D143757 [AMDGPU] Enable predicated copy right from instruction selection
The use case of Predicated copy âš™ D143762 [AMDGPU] Enable whole wave register copy