Adding restrictions on target independent opcodes in MIR

Hi,

I work on a downstream backend and I have trouble translating an architecture requirement in MIR.

Consider
bb.1
// branch to bb.2 or bb.3

bb.2

OpSpecial = …

%100 Op1 = …

%101 Op2 = %100…

BRANCH_UNCONDITIONALLY %bb.3

bb.3

Per the architecture restriction, defining %100 before OpSpecial or using it outside bb.2 will be illegal. Example a transform which would hoist the definition of %100 to bb.1 would make the code illegal. To ensure this

  1. We have a special register, (SR in the rest of the text), defined to enforce this restriction.
  2. Only certain target specific opcodes can define and use registers with this restriction. They are declared in tablegen to use SR.
  3. We add an implicit definition to SR in OpSpecial (think of it as a class of instructions and every instruction fitting the architecture criteria gets the definition).
    So the restriction is converted to a data dependence.

While OpSpecial can exist in the IR after instruction selection, the imposition of these restrictions require an analysis and transformation which are implemented as a pass in addMachineSSAOptimization. We can transform target specific opcodes to corresponding versions which use SR but we also have target independent opcodes like COPY, PHI and REG_SEQUENCE which don’t have an implicit use of SR. Moreover, COPY is also created by later passes and scheduler etc reorder these instructions leading to a correctness failure.

Does any other architecture have a similar restriction? Would they be open in providing pointers on how they solved it?

I think we can override the definitions of target independent opcodes in our target’s tablegen to have an implicit use of SR. However, would that break any assumptions about these target independent opcodes in optimizations? To clarify, if I force COPY to have an implicit use of SR, would all target independent optimizations respect this implicit use or would some optimizations ignore it?

Is there an observer/delegate based solution?

Thanks,
Malay

Hi @msanghi,

Restating what we discussed offline.

Target independent opcodes support implicit operands, i.e., the optimizations should do the right thing (modulo bugs).

(That being said, it is possible that some optimizations give up on these “weird” copies.)

I don’t believe we have an observer/delegate based solution for copy, phi, reg_sequence, etc.

Having a target hook to create such operation may be interesting (like what we do for spills), but at the same time this is trickier because there are a lot of places that create copies and do something with it (i.e., they expect 2 operands, destination at index 0, source at index 1) as opposed to the spiller that creates the spill instructions and don’t do anything with them.

Maybe there’s an opportunity to extend the adoption of isCopyInstr, but at the same time this kind of kill the nice property that copies, phis, etc. were target independent opcodes and we didn’t need to do anything special with them.
What do others think?
CC @MatzeB, @aditya

Stepping back a little bit @msanghi:

  • Shouldn’t this kind of issue be modeled with isConvergent?
  • Also, if we’re dealing with true copies (phis, etc.), why do we need to model that?
    Now that I think about it, shouldn’t that constraint appear only after you lowered your copy to your target dependent opcode?

Cheers,
-Quentin

Hi,
Thanks for the response.

Shouldn’t this kind of issue be modeled with isConvergent?

I don’t understand this flag completely. Based on the references I saw, it seems to imply that the number of threads/lanes executing this instructions does not change. Is my understanding correct? In that case, considering the above example, what property would be expected at OpSpecial so that MI defining %100 is not moved above it?

Also, if we’re dealing with true copies (phis, etc.), why do we need to model that

Are you implying that since there is already a data dependence between actual definition and the copy, the copy does not run the risk of being scheduled across OpSpecial. So not having the implicit operand until it is lowered should be fine? I think that should work. Let me think some more about it in context of practical examples.

That’s the intuitive intention behind it, yes.

I don’t see that either, but it might help you model the other requirement that you can’t move Op2 out of the basic block. It’s not a perfect fit, but depending on why you have that rule it may help.

1 Like

Exactly.
I was assuming you (@msanghi) were putting these constraints to avoid code motion across basic blocks and isConvergent is the right thing to use for that.

The constraint is more restrictive than a basic block. If you revisit the example, OpSpecial and the definition of %100 are in the same block but movement across OpSpecial is not allowed.
OpSpecial can be anything which changes the number of executing lanes/threads. So moving the %100 above it would be incorrect.

Regarding the question about true copies. I went back and looked at our motivating examples. I think the errors I saw could be classified as coalescer not respecting the implicit operand. It merges live ranges where some COPYs have an implicit operand whereas others don’t. So it ended up hoisting definition to an illegal location.
I do have a workaround (and there is also a callback in coalescer) in the target code.