How do we compare execution order of two operations?

jwkim98 · May 22, 2023, 5:13am

I wonder the correct way of knowing execution order of operations in the same region in MLIR.

Let’s assume we don’t have any loops in our code.
For example,

  func.func @example() {
   ... 
    %4 = memref.load %0[%2, %3] : memref<8x8xf32>
    %c0_1 = arith.constant 0 : index
    %c0_2 = arith.constant 0 : index
    %5 = affine.apply #map(%c0_1)
    %6 = affine.apply #map(%c0_2)
    %7 = memref.load %1[%5, %6] : memref<8x8xf32>
    %8 = arith.addf %4, %7 : f32
    ... 
    return
  }

If we want to determine if (arith.addf) executes after (memref.load) for example, How can we do that in MLIR?

For more complicated cases, if regions are nested (but still have same parent region), what would be the best practice of doing this?

  func.func @example() {
   ... 
    scf.if
    %4 = memref.load %0[%2, %3] : memref<8x8xf32>
    %c0_1 = arith.constant 0 : index
    %c0_2 = arith.constant 0 : index
    %5 = affine.apply #map(%c0_1)
    %6 = affine.apply #map(%c0_2)
    %7 = memref.load %1[%5, %6] : memref<8x8xf32>
    %10 = scf.if({
        %8 = arith.addf %4, %7 : f32
            scf.yield %8
        }, {
            scf.yield %7
        }
    ... 
    return
  }

Maybe it is easier for above examples because there is RAW dependency on %7, but if there are no dependency between two operations we want to compare, is it still possible?

Best regards

matthias-springer · May 22, 2023, 8:29am

Generally speaking, I think there are no guarantees about the execution order of operations in MLIR. There some guarantees for ops that have side effects.

Any lowering pass could shuffle the operations and bring them in a different order, as long as the side effects are preserved (i.e., looking from the outside, the program is still doing the same thing). Maybe there’s even some fancy backend that can load+addf with a single instruction. Also, once the IR ends up in LLVM (as it does for most backends), I wouldn’t be surprised if there’s yet another chance for instructions to be reordered.

If you want things to be executed in a certain order, you could take a look at the async dialect.

jwkim98 · May 22, 2023, 8:56am

I understand your point. To be more precise, I started considering this problem while implementing ‘in-place’ conversion pass while developing my own dialect.

For example,

...
%5 = memref.alloc() : memref<8x8xf32>
%6 = memref.alloc() : memref<8x8xf32>
%7 = memref.alloc() : memref<8x8xf32>
...
%8 = my_dialect.add %5, %6 : memref<8x8xf32>
%9 = my_dialect.mul %5, %7 : memref<8x8xf32>
// If there are no further usages of %6 and %7
// (%6, %8)  and (%7,%9) can use same memory

Let’s say ‘my_dialect.add’ and ‘my_dialect.mul’ performs element-wise addition and multiplication on its elements, and it will allocate new memref while lowering to separate its result from its operands.

However, If %6 and %7 is not being used by other operations, we can in-place there result of my_dialect.add operation and its operand and use the same memory instead allocating new one.
To do so, I need to find out if value %6 and %7 has any usages after my_dialect.add and my_dialect.mul.

If I do this here, will it cause problems due to operations being reordered or shuffled during further lowering process?
If you have better idea, could you recommend me?

ubfx · May 22, 2023, 9:19am

The way I understand it, the question of reusing memory like you outline in your example code is something that would usually be considered during bufferization, ie when moving from tensors to memrefs. With tensors, it seems much easier to reason about uses of a value. Is there a reason why you want to look at this question after lowering to memrefs instead of before?

tschuett · May 22, 2023, 9:22am

In your example, you can see that %7 is a result of the load and an input of addf. Thus, the addf is executed after the load.

mehdi_amini · May 22, 2023, 9:27am

This not technically wrong, but that’s not the most intuitive way to phrase it IMO.
In a CFG region: operations are executed in order, that’s the rules of LangRef.
Now as it is usual, the compiler transforms the program under the “as-if” rule: that is a transformation can change the order if it proves it can’t be observed (from a semantics point of view).
Now when a transformation changes the order, it is an IR transformation, so you can still argue that at any given point the execution order is what you see in the IR.

matthias-springer · May 22, 2023, 9:51am

There shouldn’t be any reordering issues as long as your my_dialect.add op declares memory side effects correctly (MemRead side effects on its operands and a MemAlloc+MemWrite side effect on the result, see SideEffectInterfaces.td). This prevents certain transformations that would reorder the operations.

The kind of reuse analysis that you describe is what is implemented in the bufferization framework. So if you can write your program in tensor IR, you can bufferize it (-one-shot-bufferize) and get this analysis for free. “Reusing” the same memory is called “in-place bufferization”.

This is actually not an easy problem. In your example above, you say that you can reuse %6 if it is not used later. I guess you chose %6 somewhat arbitrarily. Maybe it would be better to use %5? Also certain uses of %6 may be fine. E.g., the same buffer may be reused later for a different computation; i.e., reinitialized with fresh data without reading from it.

jwkim98 · May 22, 2023, 12:13pm

Thank you for your answers. I’ve gone through your answers, and it was very helpful. The reason I’m not using tensor dialect is because I am developing my own language which should support mutable data pointers, and offer some customized lowering passes that fits my requirements that MLIR is yet to provide (where I need lower-level control that tensor dialect does not have). I’m lowering this directly to memref and vector dialect.
But at the same time, I need to implement some features that tensor dialect provides by default, such as (-one-shot-bufferize).
I’ll have a dive into how one-shot-bufferize was implemented in MLIR source code and try to learn from it.

But, if I go back to my original question, so there is no clean and intuitive way to determine execution order of operations in the same region if they don’t have explicit dependency such as read-after-write?

matthias-springer · May 22, 2023, 12:22pm

Do you actually care about the order of operations? Or just about the order of side effects? Side effects do usually not changed or reordered: Side Effects & Speculation - MLIR

jwkim98 · May 22, 2023, 12:38pm

I wouldn’t care about ordering in final IR output, as long as there are no side effects. I’m using memory side effect interface to prevent from program being crashed after lowering. I will have to put some kind of barriers to protect my program from getting side effects after conversion. I would like to know whether some value has anymore references that would execute after current operation or not, if we don’t care about loops.

mehdi_amini · May 22, 2023, 11:05pm

I’m not sure what you’re looking for here, isn’t the following enough?

In a CFG region: operations are executed in order, that’s the rules of LangRef.

So the order you see is exactly the order of execution for any given piece of IR.

jwkim98 · May 23, 2023, 12:10am

Thank you for your clarification. So I can assume operations are in-order in the same region. Just wanted to make sure I’ve got the right idea.

Topic		Replies	Views
Does MemAlloc effect allow reordering? MLIR effects , side-effects , memory-effects	4	165	January 27, 2026
Order of Operations LLVM Dev List Archives	1	119	March 26, 2012
Will ops without side effects be reordered when running the pass? MLIR	5	248	March 16, 2025
Latency hiding / pipeline representation in MLIR MLIR	9	724	October 28, 2021
Linalg ops prevent affine loop fusion MLIR	30	738	March 13, 2025

How do we compare execution order of two operations?

Related topics