Generally speaking, I think there are no guarantees about the execution order of operations in MLIR. There some guarantees for ops that have side effects.
Any lowering pass could shuffle the operations and bring them in a different order, as long as the side effects are preserved (i.e., looking from the outside, the program is still doing the same thing). Maybe there’s even some fancy backend that can load+addf with a single instruction. Also, once the IR ends up in LLVM (as it does for most backends), I wouldn’t be surprised if there’s yet another chance for instructions to be reordered.
If you want things to be executed in a certain order, you could take a look at the async dialect.
I understand your point. To be more precise, I started considering this problem while implementing ‘in-place’ conversion pass while developing my own dialect.
%5 = memref.alloc() : memref<8x8xf32>
%6 = memref.alloc() : memref<8x8xf32>
%7 = memref.alloc() : memref<8x8xf32>
%8 = my_dialect.add %5, %6 : memref<8x8xf32>
%9 = my_dialect.mul %5, %7 : memref<8x8xf32>
// If there are no further usages of %6 and %7
// (%6, %8) and (%7,%9) can use same memory
Let’s say ‘my_dialect.add’ and ‘my_dialect.mul’ performs element-wise addition and multiplication on its elements, and it will allocate new memref while lowering to separate its result from its operands.
However, If %6 and %7 is not being used by other operations, we can in-place there result of my_dialect.add operation and its operand and use the same memory instead allocating new one.
To do so, I need to find out if value %6 and %7 has any usages after my_dialect.add and my_dialect.mul.
If I do this here, will it cause problems due to operations being reordered or shuffled during further lowering process?
If you have better idea, could you recommend me?
The way I understand it, the question of reusing memory like you outline in your example code is something that would usually be considered during bufferization, ie when moving from tensors to memrefs. With tensors, it seems much easier to reason about uses of a value. Is there a reason why you want to look at this question after lowering to memrefs instead of before?
This not technically wrong, but that’s not the most intuitive way to phrase it IMO.
In a CFG region: operations are executed in order, that’s the rules of LangRef.
Now as it is usual, the compiler transforms the program under the “as-if” rule: that is a transformation can change the order if it proves it can’t be observed (from a semantics point of view).
Now when a transformation changes the order, it is an IR transformation, so you can still argue that at any given point the execution order is what you see in the IR.
There shouldn’t be any reordering issues as long as your my_dialect.add op declares memory side effects correctly (MemRead side effects on its operands and a MemAlloc+MemWrite side effect on the result, see SideEffectInterfaces.td). This prevents certain transformations that would reorder the operations.
The kind of reuse analysis that you describe is what is implemented in the bufferization framework. So if you can write your program in tensor IR, you can bufferize it (-one-shot-bufferize) and get this analysis for free. “Reusing” the same memory is called “in-place bufferization”.
This is actually not an easy problem. In your example above, you say that you can reuse %6 if it is not used later. I guess you chose %6 somewhat arbitrarily. Maybe it would be better to use %5? Also certain uses of %6 may be fine. E.g., the same buffer may be reused later for a different computation; i.e., reinitialized with fresh data without reading from it.
Thank you for your answers. I’ve gone through your answers, and it was very helpful. The reason I’m not using tensor dialect is because I am developing my own language which should support mutable data pointers, and offer some customized lowering passes that fits my requirements that MLIR is yet to provide (where I need lower-level control that tensor dialect does not have). I’m lowering this directly to memref and vector dialect.
But at the same time, I need to implement some features that tensor dialect provides by default, such as (-one-shot-bufferize).
I’ll have a dive into how one-shot-bufferize was implemented in MLIR source code and try to learn from it.
But, if I go back to my original question, so there is no clean and intuitive way to determine execution order of operations in the same region if they don’t have explicit dependency such as read-after-write?
I wouldn’t care about ordering in final IR output, as long as there are no side effects. I’m using memory side effect interface to prevent from program being crashed after lowering. I will have to put some kind of barriers to protect my program from getting side effects after conversion. I would like to know whether some value has anymore references that would execute after current operation or not, if we don’t care about loops.