Pseudo-instruction that overwrites its input register

Hi,

I'd like to define a pseudo-instruction whose expansion will, as a side-effect, overwrite an input register's value: the pseudo-instruction

ldw r1:r2, P

to load 2 bytes from memory address P is to be expaneded to

ld r1, P+
ld r2, P

where "ld _, P+" is an instruction that loads a single byte from P, and post-increments P by one.

How can I represent this behaviour in LLVM? Currently, I have

   let Constraints = "@earlyclobber $reg" in
   def LDWRdPtr : Pseudo<(outs DREGS:$reg),
                         (ins PTRREGS:$ptrreg),
                         "ldw\t$reg, $ptrreg",
                         [(set i16:$reg, (load i16:$ptrreg))]>,
                  Requires<[HasSRAM]>;

The problem, of course, is that with this definition I end up with code which assumes it is equivalent to save P before 'ldw r1:r2,P' or after. I tried adding "@earlyclobber $ptrreg" as a Constraint, but that just leads to an assertion failure during codegen (I assume because @earlyclobber is for output ports)

void llvm::MachineOperand::setIsEarlyClobber(bool): Assertion `isReg() && IsDef && "Wrong MachineOperand accessor"' failed.

Thanks,
   Gergo

You need to express the P as both an input and output operand and add a constraint that both must be the same register.

David

OK, but then the pattern will have to include that extra output operand somehow, right? What would the pattern need to be so that during ISel, this LDWRdPtr instruction with the extra output still matches?

This is typically accomplished with something like PPC’s RegConstraint and NoEncode. You can see examples of it that are very similar to what you’re after in PPC’s load/store with update forms (i.e. load a value and update the base register with the effective address - these are used for pre-increment loads/stores).

For example: the definition of LBZU and friends in lib/Target/PowerPC/PPCInstrInfo.td.

For a simpler example of just the RegConstraint usage (as it doesn’t use a compound node like PPC’s address nodes), you can look at all the fused multiply-add such as XSMADDADP in lib/Target/PowerPC/PPCInstrVSX.td.

Hope this helps.

Thanks!

However, none of the NoEncode examples in PPCInstrInfo.td seem to have an isel pattern; and the VSX examples, like XSMADDADP, seem to match on setting a single output:

   let BaseName = "XSMADDADP" in {
   let isCommutable = 1 in
   def XSMADDADP : XX3Form<60, 33,
                           (outs vsfrc:$XT), (ins vsfrc:$XTi, vsfrc:$XA, vsfrc:$XB),
                           "xsmaddadp $XT, $XA, $XB", IIC_VecFP,
                           [(set f64:$XT, (fma f64:$XA, f64:$XB, f64:$XTi))]>,
                           RegConstraint<"$XTi = $XT">, NoEncode<"$XTi">,
                           AltVSXFMARel;

If I'm reading this right, this matches an instruction that updates $XT by taking the current $XT, and two extra args in $XA and $XB. However, my situation would be something akin to

(set f64:$XC, (fma f64:$XA, f64:$XB, f64:$XTi))

with the extra constraint that $XTi is overwritten in the process.

Is there maybe a way to write a pattern like

(set (tuple f64:$XC, f64:$XT), (fma f64:$XA, f64:$XB, f64:$XTi))

that would match

(set f64:$XC, (fma f64:$XA, f64:$XB, f64:$XTi))

by automatically lifting it to store $XT as well? (of course, with a RegConstraint that $XT = $XTi)

The reason the ones in PPCInstrInfo.td don’t have the patterns to match is the reason they are more analogous to your problem. Namely, tblgen does not have a way to produce nodes with more than one result. The load-with-update instructions do exactly that - one of the inputs is also an output, but the other output is independent (and necessarily a separate register). The FMA variants have patterns in the .td file because they don’t have multiple results - they just have one of their operands being both an input and an output.

So the idea is that you specify your outs in the instruction definition, one of those will have a RegConstraint on them and finally, you emit these nodes in your ISelDAGToDAG.cpp.

OK, thanks, I now get the basic idea -- but I'm still struggling with the implementation.

In my ISelDATToDAG, if I match something like

Selecting: t17: i16,ch = load<LD2[%v25](align=1)(dereferenceable)> t16:1, t2, undef:i16

then whatever I return as the machine node, it will have to be of the same type, i.e. (i16, ch), right? But if I have this extra output port for the changed address register, that means my output is now (i16, i16, ch). It is unclear to me how to reconcile that with the original abstract node that I'm matching on.

In more concrete terms, I tried ignoring this and just copying the address argument and the chain:

     const LoadSDNode *LD = cast<LoadSDNode>(N);
     int Offs = cast<ConstantSDNode>(LD->getOffset())->getSExtValue();
     if (AM == ISD::UNINDEXED && Offs == 0) {
       SDNode* LDW = CurDAG->getMachineNode(
         AVR::LDWRdPtr, SDLoc(N), VT, PtrVT, MVT::Other,
         LD->getBasePtr(), LD->getChain());

       ReplaceNode(N, LDW);
       return true;
     }

but this fails with

/home/cactus/prog/rust/rust-avr/llvm/include/llvm/Support/Casting.h:222: typename std::enable_if<(! llvm::is_simple_type<Y>::value), typename llvm::cast_retty<X, const Y>::ret_type>::type llvm::cast(const Y&)
[with X = llvm::ConstantSDNode;
       Y = llvm::SDValue;
       typename std::enable_if<(!llvm::is_simple_type<Y>::value),
       typename llvm::cast_retty<X, const Y>::ret_type>::type = llvm::ConstantSDNode*]:
Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed.

Any more hints, please?

Thanks,
   Gergo

Sorry, that was a complete red herring (the cast<ConstantSDNode> failed), here's the real error message I get with the below approach:

llc: CodeGen/SelectionDAG/SelectionDAG.cpp:6518: void llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDNode*, llvm::SDNode*): Assertion `(!From->hasAnyUseOfValue(i) || From->getValueType(i) == To->getValueType(i))
&& "Cannot use this version of ReplaceAllUsesWith!"' failed.

which I assume is because of the type difference I mentioned ((i16, i16, ch) vs (i16, ch) on the output ports)

OK I managed to make some progress on this, by using a MergeValues node:

       SDNode* LDW = CurDAG->getMachineNode(
         AVR::LDWRdPtr, SDLoc(N), VT, PtrVT, MVT::Other,
         LD->getBasePtr(), LD->getChain());
       SDValue Unpack[] = { SDValue(LDW, 0), SDValue(LDW, 2) };
       SDNode* NN = CurDAG->getMergeValues(Unpack, SDLoc(N)).getNode();

       ReplaceNode(N, NN);

which gets me from

   t17: i16,ch = load<LD2[%v25](align=1)(dereferenceable)> t16:1, t2, undef:i16

to

   t24: i16,i16,ch = LDWRdPtr t2, t16:1

looking tood; but then it fails during scheduling with

llc: CodeGen/SelectionDAG/InstrEmitter.cpp:303: unsigned int llvm::InstrEmitter::getVR(
     llvm::SDValue,
     llvm::DenseMap<llvm::SDValue, unsigned int>&):
Assertion `I != VRBaseMap.end() && "Node emitted out of order - late"' failed.

For reference, the full DAG before and after ISEL:

SelectionDAG has 22 nodes:
   t0: ch = EntryToken
   t2: i16,ch = CopyFromReg t0, Register:i16 %vreg0
   t5: i16,ch = load<LD2[%v25](align=1)(dereferenceable)> t0, t2, undef:i16
     t9: ch,glue = callseq_start t5:1, TargetConstant:i16<0>
   t11: ch,glue = CopyToReg t9, Register:i16 %R25R24, t5
   t13: ch,glue = CALL t11, TargetGlobalAddress:i16<i8 (i16)* @read_ram> 0, Register:i16 %R25R24, RegisterMask:Untyped, t11:1
   t14: ch,glue = callseq_end t13, TargetConstant:i16<0>, TargetConstant:i16<0>, t13:1
     t16: i8,ch,glue = CopyFromReg t14, Register:i8 %R24, t14:1
   t17: i16,ch = load<LD2[%v25](align=1)(dereferenceable)> t16:1, t2, undef:i16
     t18: ch,glue = callseq_start t17:1, TargetConstant:i16<0>
   t19: ch,glue = CopyToReg t18, Register:i16 %R25R24, t17
   t20: ch,glue = CALL t19, TargetGlobalAddress:i16<i8 (i16)* @read_ram> 0, Register:i16 %R25R24, RegisterMask:Untyped, t19:1
   t21: ch,glue = callseq_end t20, TargetConstant:i16<0>, TargetConstant:i16<0>, t20:1
     t22: i8,ch,glue = CopyFromReg t21, Register:i8 %R24, t21:1
   t23: ch = RET_FLAG t22:1

SelectionDAG has 23 nodes:
   t0: ch = EntryToken
   t2: i16,ch = CopyFromReg t0, Register:i16 %vreg0
     t9: i16,ch,glue = ADJCALLSTACKDOWN TargetConstant:i16<0>, t27:1
   t11: ch,glue = CopyToReg t9:1, Register:i16 %R25R24, t27
   t13: ch,glue = CALLk TargetGlobalAddress:i16<i8 (i16)* @read_ram> 0, Register:i16 %R25R24, RegisterMask:Untyped, t11, t11:1
   t14: i16,ch,glue = ADJCALLSTACKUP TargetConstant:i16<0>, TargetConstant:i16<0>, t13, t13:1
     t18: i16,ch,glue = ADJCALLSTACKDOWN TargetConstant:i16<0>, t25:1
   t19: ch,glue = CopyToReg t18:1, Register:i16 %R25R24, t25
   t20: ch,glue = CALLk TargetGlobalAddress:i16<i8 (i16)* @read_ram> 0, Register:i16 %R25R24, RegisterMask:Untyped, t19, t19:1
   t21: i16,ch,glue = ADJCALLSTACKUP TargetConstant:i16<0>, TargetConstant:i16<0>, t20, t20:1
     t16: i8,ch,glue = CopyFromReg t14:1, Register:i8 %R24, t14:2
   t24: i16,i16,ch = LDWRdPtr t2, t16:1
   t25: i16,ch = merge_values t24, t24:2
   t26: i16,i16,ch = LDWRdPtr t2, t0
   t27: i16,ch = merge_values t26, t26:2
     t22: i8,ch,glue = CopyFromReg t21:1, Register:i8 %R24, t21:2
   t23: ch = RET t22:1