I am hoping that someone can offer advice on a somewhat unusual issue that I am facing with the SDAG. Namely, I am trying to implement some custom operations that do very specific things on multiple registers at a time. The operations themselves will simply be intrinsics since there are no equivalent operations in IR/SDAG. However, handling the types seems rather tricky.
One approach I tried is to create a register class that has the wide registers with proper sub registers and then telling the SDAG that the correspondingly wide type can go into those registers. While this works, it has a very unfortunate side effect that the type legalizer leaves any node with such a type untouched and I have to mark all operations as non-legal (mostly Expand).
For example, I could say that the type v8i64 can go into these registers and then I can use the type for my intrinsics. However, the type legalizer will leave all nodes with this result/operand type alone which is not at all what I want.
Then I tried the opposite approach - just custom lower only specific nodes that have this result type and let the type legalizer handle all the others normally. This works quite well except if I want to expose those custom instructions through inline asm. The DAG builder complains if I am trying to assign one of these wide registers to a value with the wide type because it assumes that the wide value will need to be broken up.
I suppose I could define a new type for the IR/SDAG and use it, but that seems like a super pervasive approach.
So either direction I go in seems to have a major drawback.
I’m missing some details on what your constraints are. You have an operation on contiguous v8i64 registers, and not some number of multiple i64 registers? If you really have the vector operation on the vector width, adding the legal type is the most honest strategy and probably your best bet despite the pain induced by needing to expand all of the vector operations. If you really wanted to trick the legalizer and hack out the type with ReplaceNodeResults, I would expect it’s theoretically possible to hack up the inline asm handling to deal with this, but I probably wouldn’t recommend this. This is the kind fo problem that’s avoided in GlobalISel, since the concept of type legalization is gone.
Yeah, the operations are done on either pairs or 4-tuples of consecutive vector registers.
What do you think about the idea of creating separate pair/quad types in the IR and SDAG to represent these? That way the only way such a type would come into existence would be with the intrinsics.
You don’t need to add any new type You can already use a struct which will expand to multiple registers
I’m not sure I understand the difficulty here. Normally, if you have an instruction which has multiple operand/result registers, you just make the SelectionDAG node have multiple operand/result values. If there are weird register allocation constraints, you can handle that in ISelDAGToDAG. (There are a few ARM instructions that expect multiple registers in ascending order, like vtbl and vld4/vst4.)
If you need inline asm operands/results with an illegal type, that’s sort of an independent issue. x86 uses a fake register class to handle the “A” constraint, which refers to the register pair RAX/RDX. (See X86TargetLowering::getRegForInlineAsmConstraint). If that doesn’t work in your case, not sure what I’d do off the top of my head; maybe the code for lowering inline asm could be extended.
Well, frankly the issue is mainly the inline asm.
Say the instruction has the form
RT, RA, RB
Where all of RT/RA/RB have to be multiples of 4. The instruction does a binary operation: RA/RA+1/RA+2/RA+3 RB/RB+1/RB+2/RB+3. Namely, the operation is performed on 4 vector registers at a time, producing a 4 vector register result.
This can be modeled in a rather straightforward way with the right number of operands and results to the SDAG node for the operation.
However, to let the register allocator select a register for an inline asm constraint, I need to say that variable X goes into register R. I have a constraint that says give me one of these registers that are composed of 4 other registers. And the variable has a type that is as wide as 4 vectors (say v8i64). Then when the DAG builder tries to build the INLINEASM node for that directive, it wants to split the illegal type into 4 vectors to create the CopyToReg nodes.
This is really an issue with any type that is wider than the widest register.
If you want to investigate extending inline asm capabilities, I’d start by looking at InstrEmitter::EmitSpecialNode. It already understands inline asm operands that don’t fit in a single register; it just isn’t handling them the way you want it to. Basically, the idea would be that you emit a REG_SEQUENCE pseudo-instruction to merge the register operands into one big register, and EXTRACT_SUBREG to split the register result into multiple registers.