Hi, it looks like the current MLIR LICM pass doesn’t support speculatively hoisting an operation out of the loop. For example, given a program like below, hoisting the pointer dereference out of the loop seems not available currently.
int f(int *ptr, int n) {
int s = 0;
for (int i = 0; i < n; ++i) {
s += *ptr;
}
return s;
}
My gut feeling is that we don’t do loop rotation as a canonical pass, so there is no guarantee that the loop must be run when it comes to LICM. Am I understanding it correctly?
Also it seems that LICM is pretty strong about not moving memory ops out of the loop, i.e, isMemoryEffectFree is required, while LLVM is relatively weak on that. Is it also related to speculation?
My gut feeling is that we don’t do loop rotation as a canonical pass, so there is no guarantee that the loop must be run when it comes to LICM. Am I understanding it correctly?
This depends on the specific loop operation but is true for something like scf.for and for the “before”-region of a scf.while.
In some scenarios you could deduce from the bounds of a scf.for whether it is excecuted at least once in which case speculatability would not be required to hoist an operation, only that it has no side-effects.
Also it seems that LICM is pretty strong about not moving memory ops out of the loop, i.e, isMemoryEffectFree is required, while LLVM is relatively weak on that.
Correct. Moving side-effecting operations out of loops is sadly non-trivial as this requires analysis to ensure that the loop does not contain other side-effecting operations that affect the operation that is to be hoisted. LLVM can analyze this with MemorySSA but in general requires more infrastructure such as alias-analysis and more.
Is it also related to speculation?
Not quite. Speculatability is mostly an indicator for whether an operation has wholly defined behaviour or not which is somewhat orthoganal to side-effecting operations, but in practice often appear together. Both need to be accoutned for when determining whether code motion is legal. See the original discussion that lead to the introduction of speculatability here: [RFC] Mark tensor.dim and memref.dim as side effecting
Regarding the memory side effect analysis, do we have any plan to bring it up in the near future? MemorySSA might be a bit fancy at the moment. A much simpler analyzer, e.g, detecting no stores but only loads in the loop, as a start should work well for us.
That’s not quite generic enough to assume. For example:
n = 0, the loop doesn’t run, the pointer is never accessed. Hoisting makes it be accessed once at least.
ptr points to a memory-mapped register or similar side-effect read. Reading it zero, once or more times may have different interpretations in hardware.
Unless you have annotation that the ptr is side-effect free, you cannot assume that. Unless n is known at compile time to be non-zero (and non-negative, because this is int), you cannot assume the loop will run at all.
I believe we’re saying the same thing, but the way you said (to me) sounds like just that loop in the example could be interpreted as side-effect free. Maybe I got it wrong.
It sounds like the term side-effect might be a bit too general here. Is there a way to separate the side effect for the following case from other general memory accesses?
ptr points to a memory-mapped register or similar side-effect read. Reading it zero, once or more times may have different interpretations in hardware.
Side-effect is the precise term here because any type of side-effect would stop LICM, but you’re interested in some effects and not others.
It’s not just volatile. Register mapped (or I/O mapped) reads can cause attached hardware to take action. For example, a read from a co-processor can flush the value and push another value into the register, while a write to the same position may make the co-processor send this data to other devices.
Luckily, such wild cases are rare nowadays outside of deeply embedded targets, so perhaps can be ignored for most MLIR use-cases. But still, we should not assume total safety, just “accept” partial safety and be clear about it.
@htyu, this may be completely irrelevant to your case, and the LLVM dialect may not even have semantics for that, so take what I said with a grain of salt. @zero9178’s answer is spot-on and should help you directly.
I have some changes that I will be cleaning up and submitting for review soon, which address the ideas discussed here. When an operation is side-effect free, it will be hoisted normally. However, if it has only read side-effects, it will be hoisted with a trip-count check guard. This guard wraps the for loop similar to WrapInZeroTripCheck.cpp. I hope the following example is helpful.
where test.always_speculatable_op_with_memread is defined as:
def AlwaysSpeculatableOpWithReadSideEffect : TEST_Op<"always_speculatable_op_with_memread",
[ConditionallySpeculatable, MemoryEffects<[MemRead]>]> {
let description = [{
Op used to test LICM conditional speculation. This op can always be
speculatively executed and has only memory read effect.
}];
let arguments = (ins Variadic<AnyType>:$inputs);
let results = (outs Variadic<AnyType>:$outputs);
let extraClassDeclaration = [{
::mlir::Speculation::Speculatability getSpeculatability();
}];
let extraClassDefinition = [{
::mlir::Speculation::Speculatability
AlwaysSpeculatableOpWithReadSideEffect::getSpeculatability() {
return ::mlir::Speculation::Speculatable;
}
}];
}
Please share your feedback and ideas both on here and on the code review.
sorry for the late response. I think one of the downside of adding extra control flow around the loop is that it is likely to block other optimizations as the form of the loop is not as canonical. I wonder if this is a good fit for the existing moveLoopInvariantCode function.
Maybe others have thoughts about that.
An alternative would be to generate something like:
Thanks for the response. We thought about that option too, i.e., instead of placing the loop under that speculative if check, only the hoisted code will be placed there. I’m not sure how the optimizations would perform differently though. If we are looking at optimizing the loop only, perhaps an unconditional out-of-loop value might make it easier? But for optimizations tracking the loop result, wrapping the loop under an if would make the result conditional.
Another thought is that how the if could be removed if the hoisted load can be run in a predicated way (not supported on MLIR). Would it make sense to clean up the if in a downstream pass (e.g, a Triton pass)?
The parallel if and loop structure Thomas outlined does look more standard and closer to a canonical loop optimizer where loop unswitch is usually done first.
This implementation is more reasonable, and its impact on other optimizations outside of LICM can be minimized. In our downstream llvm, we have implemented licm also for operations with read/write effect and this is exactly how we do it.
But it’s still worth mentioning here, in some cases, it can also affect the accuracy of the analysis. Considering that we are performing reaching definitions analysis to the following program:
A = alloc
def A // op0
for {
def A.subview0 // op1
def A.subview1 // op2
}
use A
We can infer that the definition of A originates from op1 or from co-definition of op0/1/2.
If licm is performed on op1, we have our result:
A = alloc
def A // op0
if xx (
def A.subview0 // op1
)
for {
def A.subview1 // op2
}
use A
Performing reaching definitions analysis again on the program, the information that op1/2 must appear together is lost. Our analysis result includes the possibility of being defined only by op0/1, which does not exist in the original program.
Of course, this can be addressed by further optimizing the data flow analysis to obtain more precise results.