Multi-return and writing to specific registers in LLVM TableGen patterns

Hi,

I am trying to add LLVM intrinsic for a peculiar op in my ISA, which reads from r(X) but writes to r(X+1) and r(0). It seems like RegisterTuples could be used for this purpose, but I am running into a few issues lowering the intrinsic, which returns two values, to MC.

Since my target does not support spilling to stack, I will need the register allocator to allocate a register pair (say r8, r9) plus a single register (say r20), so it translates to this in MC:

my_mv r8, {src}
my_mv r20, r0   # save r0
my_op r9, r0, r8
my_mv r8, r0
my_mv r0, r20   # restore r0

Where r8 and r9 will be the return values of the intrinsic. However, as the title says, I wonder:

  1. I was under the assumption that return value of the last dag in Pattern destination pattern will be the return value of the whole pattern (i.e. mapped to the return value of the intrinsic), is that correct?
  2. How do I force a write into r0?
  3. How do I turn a single RegisterTuples of two registers into a multi-return?

For 1 and 2, my understanding was that I will have to define my own pseudo-ops, probably something like this:

def : Pattern<
    (int_my_op i64:$x),
    [(i64x2 (IMPLICIT_DEF:$p)),
     (INSERT_SUBREG $p, $x, i64x2_odd),
     (MY_MV:$tmp r0),
     (INSERT_SUBREG
         $p,
         (MY_OP (EXTRACT_SUBREG $p, i64x2_odd)),
         i64x2_even),
     (INSERT_SUBREG
         $p,
         r0,
         i64x2_odd),
     (PSEUDO_WRITE_R0) $tmp),
     (PSEUDO_RET_2
         (EXTRACT_SUBREG $p, i64x2_even),
         (EXTRACT_SUBREG $p, i64x2_odd))]>;

Where i64x2 is my RegisterTuples type, PSEUDO_WRITE_R0 writes input value to r0, and PSEUDO_RET_2 simply takes two values and returns two values (I probably need getRegAllocationHints() to make it efficient).

I wonder is there a better way to achieve this?

It seems like we will end up using let AltOrders = [(rotl MYREGCLASS, 2)]; to reserve r0 and r1, plus getRegAllocationHints() to force my_op to use r0 as input and write to r0 and r1. We will add an extra my_mv on the input of my_op during LLVM->MC lowering, so that my_op modifies a copy of the source register, not the register itself.

UPDATE 1: Actually, it might be easier to just define new register classes, R0ONLY=[r0], R1ONLY=[r1] and use them in MC definition for MY_OP, and avoid using getRegAllocationHints() completely.

UPDATE 2: Approach suggested in UPDATE 1 would break (de)assembly support for other normal usages, e.g. my_op r5, r0, r4.

Just in case someone from the future run into similar issues:

  1. In short, we need to support r(X+1), r(0) = OP r(X);

  2. We found that using RA hints to force reading r0 and writing to r0, r1 has a drawback: when there are back-to-back invocations of the op where the first one’s output is used as the second one’s input, register spilling may be triggered, which is not supported in our architecture.

  3. So, we used XXXTargetLowering::finalizeLowering() to rewrite:

    %Out1, %Out2 = OP %In
    

    To:

    r0 = COPY %In
    r1, r0 = OP r0, ...
    %Out1 = COPY r1
    %Out2 = COPY r0
    

    Functionally it works, but requires unnecessary COPY to/from r0 and r1 when there are back-to-back invocations, which is becoming a bottleneck in our use cases.

  4. Our current solution is to temporarily widen the op to read/write even-odd reg pairs between ISel and RA, use constraints to make RA assign the same reg pairs for %Out1 and %In, and once RA is done, convert back to singular regs. There are a few places that we touched:

    • XXXDAGToDAGISel::Select(), for inserting the widening ops;
    • XXXTargetLowering::finalizeLowering(), for hard-coding %Out2 to r0;
    • XXXInstrInfo::expandPostRAPseudo(), for rewriting our pseudo MC op to the real op;
    • let hasExtra{Src,Dst}RegAllocReq = 1 on the real op.

    This also gets rid of the hard-coded r1 usage, allowing LLVM to e.g. read r4 and write to r5 and r0 (in our ISA that op always write the second result to r0). The downside is that RA does not know our op only partially updates the reg pair, so for cases like:

    killed %d1, %a1 = OP %src, ...
    killed %d2, %a2 = OP %src, ...
    

    With our current implementation, LLVM cannot produce this optimal code:

    r2 = COPY %src
    killed r3, r0 = OP r2, ...
    killed r3, r0 = OP r2, ...
    

    Because RA only sees things like:

    r2_r3, r0 = OP r2_r3, ...
    

    It does not know OP preserves the value of r2. Fortunately, we do not have an usage that would trigger this yet.

To solve this problem “perfectly”, we will need the ability to specify let Constraints = "$src = $dst - 1" in target instructions, which LLVM does not support yet.