Pass to tie an output operand to a subregister of an input operand

TLDR: I created a pass that ties a register to its sub-register and runs before twoaddressinstruction. What could go wrong and untie the registers?

I have seen some discussion already on how to tie register to its sub-register, e.g. here and here. In the current state, it seems to me there is no clear solution yet, and one has to be conservative.

I am working with an instruction which takes a composite register as input, and overwrites one of its sub-registers. I summarized the problem in the snippet below:

foreach Index = 0...255 in {
  def R#Index : Register <"r"#Index>;
}
def GPR32 : RegisterClass<"TestTarget", [i32], 32,
                          (add (sequence "R%u", 0, 255))>;

def sub_128_0 : SubRegIndex<32, 0>;
def sub_128_1 : SubRegIndex<32, 32>;
def sub_128_2 : SubRegIndex<32, 64>;
def sub_128_3 : SubRegIndex<32, 96>;
def GPR128 : RegisterTuples<[sub_128_0, sub_128_1, sub_128_2, sub_128_3],
                            [
                             (decimate (shl GPR32, 0), 1),
                             (decimate (shl GPR32, 1), 1),
                             (decimate (shl GPR32, 2), 1),
                             (decimate (shl GPR32, 3), 1)
                            ]>;
  
  // TODO: Constraints = "$sub0_out = $reg.sub_128_0"
  def FOO : Instruction<
      (outs GPR32:$sub0_out), (ins GPR128:$reg),
      [], "foo ", "$reg">;

I want to encode the fact that my output $sub0_out register actually corresponds to the sub_128_0 subregister of the input $reg. For practical and performance reasons, I cannot just say that the whole register is overwritten.

What I’ve been doing so far is mimicking what is done for “standard” tied registers. As I understand it, the main handling of such tied registers is done in the twoaddressinstruction pass. For each tied register pair, the latter will insert a copy of the source register to the destination register before the instruction, and then replace the source operand with the destination register in the instruction.

  %0 = TIED_SRC_DST %1
  ---> after twoaddressinstruction:
  %0 = COPY %1
  %0 = TIED_SRC_DST %0

I created a pass which does a similar thing for my tied sub-reg problem, let’s call it subregconstrainer. It’s currently running after PHI node elimination, and before twoaddressinstruction. It will insert a COPY of the source register (%1) to a new scratch register (%10). Then, it will replace all the uses of the destination register with %10.sub_128_0. The source register (%1), is also replaced with %10. Essentially, the following happens:

  %0 = FOO %1
  SOME_INSTR %0
  ---> after subregconstrainer:
  %10 = COPY %1
  %10.sub_128_0 = FOO %10
  SOME_INSTR %10.sub_128_0

My question is: what could go wrong? I tested multiple scenarios, and my registers always stayed tied. I had a look at the passes involved in register allocation (mainly twoaddressinstruction, regcoalescer, greedyregalloc, fastregalloc), and couldn’t spot something which would immediatly go wrong and untie my registers. But I don’t know a lot about the register allocators in llvm, and even less about their hidden assumptions. I’m afraid they could insert copies (to e.g. split live ranges?) and replace my operands. The only thing that reassures me is that hasTiedOperand() is currently not queried that much during register allocation.

My current belief is that, as long as I use the same virtual reg as in/out (although, the out operand will have a subreg index), I’m somewhat safe. I also used hasExtraSrcRegAllocReq = true, hasExtraDefRegAllocReq = true to prevent the registers from being renamed after register allocation.

Wrapping it up:

  • Could someone confirm that an instruction like %10.sub_128_0 = FOO %10 will not be rewritten, and I will get something like $R0 = FOO $R0_R1_R2_R3 after regalloc?
  • What are the particular contracts/assumptions that the different register allocation passes have between themselves, in particular regarding tied operands?

FYI @qcolombet @MatzeB

2 Likes

Hi, Gaëtan

Thank you for the detailed problem description, proposed solution and positive results (and for linking my question:).

My case is similar, although performance is not a big problem. I didn’t try to solve the issue, but the solution I was thinking of is exactly the same as you proposed.

I’m not an expert in register allocation, but I think you’re safe with this approach. AFAIK the register allocator maps a virtual register to exactly one physical register (see VirtRegMap). It can create new virtual registers, but it will not replace an existing virtual register with a new one.

To make sure that the register allocator does not break constraints, you could implement TargetInstrInfo::verifyInstruction.

If you’re planning to upstream your solution, here is a couple of thoughts you might want to consider.

  • It would be great if the solution is generalized to include the case when the destination register is a superregister of a source register. It would be awesome if the solution works for the case when both source and destination registers have common superregister, but otherwise are not related.
  • TwoAddressInstruction pass is not the only consumer of the constraints. AsmWriterEmitter uses them to generate printAliasInstr, AsmMatcherEmitter uses them to generate instruction verification code, DecoderEmitter use them somehow in populateInstruction.

Cheers,
Sergei

1 Like

Hi Sergei, thanks for the answer, it already gives me some more confidence.

Assuming this solution is safe, I definitely do plan to put in more work and generalize support to TableGen. Currently, I have to define my instruction like this:

let DisableEncoding = "$sub0_out" in
let DecoderMethod = "DecodeFOOInstruction" in
let hasExtraSrcRegAllocReq = true, hasExtraDefRegAllocReq = true in
  def FOO : Instruction<
      (outs GPR32:$sub0_out), (ins GPR128:$reg),
      [], "foo ", "$reg">;

Meaning, I have to help the encoder and provide a custom decoder. And that’s not even touching the AsmMatcher. My ultimate goal would be to extend the TableGen generators and the different consumers of HasTiedOperands to get a clean solution like this:

let Constraints = "$sub0_out = $reg.sub_128_0" in
  def FOO : Instruction<
      (outs GPR:$sub0_out), (ins GPRPAIR:$reg),
      [], "foo ", "$reg">;

Best,
Gaëtan