[TableGen/RegAlloc] How to use a fixed register in an instruction?

Hello!

I’m trying to make the 'v_div_scale instruction of the AMDGPU backend always use the “VCC” register for its scalar output ($sdest), instead of a (undetermined) scalar register. The goal is to:

  • Refuse to assemble instructions that uses registers other than VCC for the destination register of that instruction.
  • Always use VCC for that instruction’s destination register during codegen.

It’s my first time trying to tweak tablegen/register allocation in such a way so I’m a bit struggling, and my latest effort result in the Greedy Register Allocator failing in various way.

Since it doesn’t seem like there’s any instruction that forces VCC like that in the backend currently, I had to define the following records in TableGen to get started:

// Note: I'm very much unsure about the types and whether VCC_LO is needed considering it's a sub register of VCC?
let GeneratePressureSet = 0 in
def VCCR : VRegClassBase<1, [i32, i64, f32, f64], (add VCC_LO, VCC)> {
  let isAllocatable = 0;
}


def VOPDstVCC : RegisterOperand<VCCR>;

And then I used VOPDstVCC for my instruction’s output. Those are the only changes I made so far.

  let Outs64 = (outs DstRC:$vdst, VOPDstVCC:$sdst);

TableGen accepts this, but LLC fails at runtime. I observe the following behaviour in one of the tests:

  • Using IsAllocatable = 1 causes an assertion to fail: failure register pressure underflow.
  • Using IsAllocatable = 0 causes a variety of machine code verification failures, seemingly because the register allocator uses the wrong instructions to spill VCC? (is that even supposed to happen?), see below:
*** Bad machine code: Operand has incorrect register class. ***
- function:    v_fdiv_v2f64
- basic block: %bb.0  (0x564b641b46c0) [0B;800B)
- instruction: 628B	$vcc_lo = SI_SPILL_V32_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.0, addrspace 5)

*** Bad machine code: Illegal physical register for instruction ***
- function:    v_fdiv_v2f64
- basic block: %bb.0  (0x564b641b46c0) [0B;800B)
- instruction: 628B	$vcc_lo = SI_SPILL_V32_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.0, addrspace 5)
- operand 0:   $vcc_lo
$vcc_lo is not a VGPR_32 register.

I’m currently stuck with this and not sure of what to try next. I don’t know if all the work has to be done in TableGen, or if I need to change some code in a RegisterInfo/InstructionInfo class as well. I also don’t know if I’m on the right track and just facing some bugs, or just doing something completely wrong.

For instance, if IsAllocatable = 0 is indeed the right answer, maybe I just need to tweak something in the RegisterInfo class/add new instructions to spill/reload VCC properly? I’m not sure if that should be a thing, however.

Could someone please explain to me what I’m doing wrong and point me in the right direction?

Thank you very much.

Trying to use singleton allocatable register classes doesn’t really work. You want to stop using an allocatable virtual register operand and switch to using an implicit physical register

Ah, that makes more sense indeed. It really felt like I was telling LLVM to “use this register class that happens to contain the register I want” instead of properly telling it to def a physical register from the start.

I tried to:

  • Remove sdst from the Outs64
  • Use %vcc instead of $sdst in the Asm64
  • Add Defs = [VCC] around the VOP3Inst_Pseudo_Wrapper

And it fails now because sdst is not found (Too few operands in record V_DIV_SCALE_F32_e64_gfx11 (no match for variable sdst)), it makes sense because the VOP3be encoding expects sdst in let Inst{14-8} = sdst;

What’s making this difficult, I think, is the fact that the encoding this instruction uses has a spot for SDST but we’re trying to constraint it to always be VCC, no?
Maybe I need to add an if there so it can encode sdst or a fixed vcc depending on the situation (a bit in the VOPProfile?) ?

I got past the “no match for variable sdst” trouble by tweaking the encoding.

let Inst{14-8}  = !if(P.SDstIsAlwaysVCC, 0, sdst);

Now I’m working my way through some ISel failures, I think that since I removed an output register all operands after the first one are shifted (since the 2nd operand has been removed).

Seems like there are also some changes needed in GISel with setting regbanks and such, but I think it’s on the right track.

My current problem is that DIV_FMAS gets selected into:

Into:
  $vcc_lo = COPY %18:vcc(s1)
  %22:vreg_64(s64) = nofpexcept V_DIV_FMAS_F64_e64 0, %21:vreg_64(s64), 0, %19:vreg_64(s64), 0, %20:vreg_64(s64), 0, 0, implicit $mode, implicit $vcc, implicit $exec

And the COPY (which is useless, I think?) causes a GISel failure:

LLVM ERROR: VReg has no regclass after selection: $vcc = COPY %18:vcc(s1) (in function: v_fdiv_f64)

I don’t understand why the copy is still there, I feel like it should be removed since it’s a copy from a super reg to a sub reg? But somehow %18 is a virtual register, it’s not even a physical one, so LLVM doesn’t pick it up. It originates from those instructions:

%17:vgpr(s64), %18:vcc(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), %0:vgpr(s64), %1:vgpr(s64), 1
 %22:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), %21:vgpr(s64), %19:vgpr(s64), %20:vgpr(s64), %18:vcc(s1)

So the intrinsics at a IR/gMIR level still have the explicit s1 in/out operands - even though it’s always VCC - but they lose it and it becomes an implicit use/def, though LLVM still generates a copy.

Just some DAG lowering/GISel select fixes needed, now it works but tthere’s other issues that can be addressed during review.
https://reviews.llvm.org/D131959