I need to lower a node into something in the machine that has side effects, i.e. needs a chain. Specifically it’s actually UDIVREM. UDIVREM does not have a chain. I can custom lower UDIVREM into the nodes I want, with chain, I can even chain the new nodes and connect them to entry and root with token factors. But then the new nodes are not chained with respect to other nodes, or not chained with respect to each other, in case there are several UDIVREM.
Hence, can I combine or lower into a node that has a chain from a node that hadn’t and let the DAG construction chain this into the existing chain? I tried and it seemed to me that I can’t.
Or is the only way to code prepare before selection DAG into e.g. a custom intrinsic with side effects, to get the desired chain properties?
Thanks for any help in advance.
Do they really need to be chained with each other or anything else?
The case I know of is when they get lowered to a libcall. That libcall
has effects that mean it needs a chain of some kind, but it doesn't
really matter in any other way where in the basic block it happens.
As long as it's after its inputs have been created and before its
outputs are consumed, everything is fine. And that's handled by the
normal value operands.
Re: Do they really need to be chained with each other or anything else
Yes. For 2 reasons. Our architecture lowers udivmem into something with 1 producer and 2 consumers. Reason 1) neither the producers nor the consumers must get reordered. Reason 2) one of the consumers might be missing (either the div or mod consumer might not be present. Yet we need to keep the consuming instruction with side effects. The only way to achieve is to add it into the chain. Problem here: divmod does not have a chain.
I scanned other architectures, haven’t found an example where somebody creates a chain out of thin air. Do you know any?
Chain doesn’t guarantee that operations on parallel chains don’t get interleaved. So if you need them to be tightly connected together and there’s no data value flowing between them, then you probably want to look at using Glue rather than Chain.
This is the case for all operations expanded as library calls. The call sequence involves a chain. One of the AMDGPU fdiv expansions also introduces a side effecting mode switch with a chain
Chain doesn’t guarantee that operations on parallel chains don’t get interleaved
This would be a sequential chain…
This is the case for all operations expanded as library calls
I think their originating node already has a chain (i.e. mem operand or side effect in llvm-ir). My case is a arithmetic node without ordering constraints (divrem) getting lowered into sth that does have ordering constraints. I first thought it is very straight forward, turns out it is not a common case. My current WA will be to code prepare into intrinsics with side-effects. I was wondering if that’s really necessary…
Thanks for the input
No, non-sideeffecting operations can be legalized as compiler-rt calls
No, non-sideeffecting operations can be legalized as compiler-rt calls
Right, but not as “regular” nodes with side-effects? I guess you could search and analyze the DAG manually but that seems hacky. Maybe something that one day LLVM could support natively.
You can’t add arbitrary chains or glue to the regular nodes, but you can define a custom node you select the same way with your chain/glue. You don’t need to preprocess the IR and can do in the custom lowering. This is what AMDGPU does for FDIV (see AMDGPUISD::FMA_W_CHAIN). GlobalISel avoids these complications by not having nodes or chains, and just instructions with side effects, so in that sense this is a solved problem.
Yea. I think AMD chains the node they’re expanding into, but they don’t chain it into an existing chain. e.g. adding A->B to the DAG is ok. But adding A->B and next C->D with B->C is the problem. I appreciate the input
Still sounds to me as Glue might help (as already proposed by Craig), but maybe I’ve misunderstood something.
Another option is to do a simple lowering into pseudo instructions that you expand after ISel.
(might be easier than doing something before ISel and then having to bother about chains, glue etc)
newbee here. What’s the difference between glue and chain?
Why can’t we add chains to any node we want?
Chains represent a dependency between nodes that can’t be represented by a data dependency. For example a load following a store that might alias with the address of the load. The store must happen before the load. So the load’s chain input is dependent on the store’s chain output either directly or through other intermediate nodes that also have chain inputs and outputs. There can be multiple chains in parallel in the DAG. TokenFactor nodes are used to merge separate chains. The InstrEmitter ensures that the chain dependency is satisfied when emitting the linear instruction sequence after isel. But nothing guarantees that parallel chains won’t be interleaved. After a node is schedule all of the nodes dependent on it either through data or chain are checked to see if they are now ready to schedule. The scheduler will pick from the ready to schedule nodes without any concern for whether they were on the same chain as the last node scheduled.
Glue is stricter, it says that two nodes must be scheduled adjacent to each other in the linear instruction sequence.
I did it by code preparing into an intrinsic that has side effects. Pseudo instruction would work as well. I’m not sure if glue would help, since the nodes A->B, C->D from example above are not necessarily adjacent.
More hooks into the selection DAG builder may be an idea for a LLVM extension. For example in this case, custom allowing for a node to be built with an existing chain would have been helpful.
It sounds like this is effectively the divmod instruction writes to two special-purpose internal registers, which the div_consume and mod_consume instructions read. If those two special-purpose registers could be read/written, then you could actually model it that way, treat them as normal registers written by the first instruction and read by the latter 2. They could be spilled/restored as required. However, it seems probable that there is no instruction to write a value to these internal registers.
In such a case, typically you’d expand the UDIVREM operation into 3 instructions, all attached to each-other with GLUE, so that other things cannot be inserted in between and mess up the secret handshake. You say you don’t want to do that, but didn’t say why. That would surely be the most straightforward implementation.
So first, I did solve the problem by code-preparing. Alternatively, I could have used a pseudo-instruction.
Secondly, I really appreciate everybody’s input nevertheless.
Re: " writes to two special-purpose internal registers"
Not possible on our machine: not spillable, not arbitrarily writable, ordering constraints.
Re: “You say you don’t want to do that, but didn’t say why”
Thinking more about it, I am taking back what I said. I think it would work if there are no surprises. I may even implement and try, it seems better than what I currently have.
Thinking more about it, I am taking back what I said
I guess an example where this would not work would be, though it doesn’t apply to my machine, if those blocks of glued nodes must not be reordered, as glueing does guarantee the nodes to be scheduled together, but there’s no edge between independently glued nodes.