[RFC][MLIR] Introduce hoist-pure-ops pass

linuxlonelyeagle · November 19, 2025, 2:47pm

Abstract

Hope to add hoist-pure-ops in MLIR upstream, hoist-pure-ops pass can hoist the Pure ops, to gain more opportunities for optimisation.

motivation

Following is the the initial ideas.

func.func @hoist_cast_pos_alloc(%arg: i1) -> (memref<?xf32>) {
  %alloc = memref.alloc() : memref<10xf32>
  cf.cond_br %arg, ^bb1, ^bb2
^bb1:
  %cast = memref.cast %alloc : memref<10xf32> to memref<?xf32>
  return %cast : memref<?xf32>
^bb2:
  %cast1 = memref.cast %alloc : memref<10xf32> to memref<?xf32>
  return %cast1 : memref<?xf32> 
}

Hope it transforms into the following IR. However, CSE pass cannot solve this issue.Because % cast does not dominate %cast1. memref.cast is a pure op, perhaps we can reorder it based on SSA dominance. We have had some discussions on this issue before. Will ops without side effects be reordered when running the pass?

func.func @hoist_cast_pos_alloc(%arg: i1) -> (memref<?xf32>) {
  %alloc = memref.alloc() : memref<10xf32>
  %cast = memref.cast %alloc : memref<10xf32> to memref<?xf32>
  cf.cond_br %arg, ^bb1, ^bb2
^bb1:
  return %cast : memref<?xf32>
^bb2:
  return %cast : memref<?xf32> 
}

Based on the above discussion, we can write a pattern for memref.cast and adjust the order of the cast.Now we can use CSE pass.

func.func @hoist_cast_pos_alloc(%arg: i1) -> (memref<?xf32>) {
  %alloc = memref.alloc() : memref<10xf32>
  %cast = memref.cast %alloc : memref<10xf32> to memref<?xf32>
  %cast1 = memref.cast %alloc : memref<10xf32> to memref<?xf32>
  cf.cond_br %arg, ^bb1, ^bb2
^bb1:
  return %cast : memref<?xf32>
^bb2:
  return %cast1 : memref<?xf32> 
}

Thanks @mehdi_amini for asking me this question.“what makes cast “special” with this property? Why not other ops?”, I think if an Op is a Pure Op, we have the opportunity to hoist its position based on SSA dominance, so perhaps we can write a more generic pass to hoist Pure op.

code implementation

I have made a simple implementation of the code above. [mlir] add hoist-pure-ops to mlir by linuxlonelyeagle · Pull Request #168715 · llvm/llvm-project · GitHub .If you have any questions, do let me know. Thanks.

ftynse · November 20, 2025, 8:25am

This looks similar to loop-invariant code motion but for conditionals. Can the two be generalized to something that works across control flow operations?

matthias-springer · November 20, 2025, 8:54am

I looked into this earlier. The LICM pass is a bit simpler because it directly matches loop-like ops. In particular, it does not need DominanceInfo. So implementation-wise I’m not sure… But maybe it makes sense to put both implementations in the same file and to expose both hoistings via a single pass (with pass options).

linuxlonelyeagle · November 20, 2025, 1:38pm

Perhaps we could use hoist-pure-ops instead of loop-invariant code motion.

mehdi_amini · November 20, 2025, 1:54pm

They’re not doing the same thing: LICM is intended to reduce the repetition of these operations (dynamically) by moving them out of the loops.
What you’re proposing (moving them close to their definition) is possibly making these executed when they wouldn’t have been before (maybe even potentially moving them into a loop when they were outside before)

linuxlonelyeagle · November 20, 2025, 2:15pm

maybe even potentially moving them into a loop when they were outside before

I’m not entirely sure whether such a situation actually exists.

mehdi_amini · November 20, 2025, 2:50pm

Here is a trivial example (just a twist on your example) that demonstrates it:

func.func @hoist_cast_pos_alloc(%arg: i1) -> (memref<?xf32>) {
  cf.br ^entry
^entry:
  %get_memref = "test.get_memref"() : () -> memref<10xf32>
  %cond = "test.loop_cond"() : () -> i1
  cf.cond_br %cond, ^entry, ^exit
^exit:
  %cast = memref.cast %alloc : memref<10xf32> to memref<?xf32>
  return %cast : memref<?xf32>
}

ftynse · November 20, 2025, 3:05pm

Hmm, I read the proposal as only hoisting if both edges in the condition have the operation…

mehdi_amini · November 20, 2025, 3:18pm

The example in the RFC was misleading: the implementation just checks if the operand is in a different block than the producer and relocalize the operation with the producer.

linuxlonelyeagle · November 21, 2025, 9:00am

I think this idea makes sense. Introduce a new constraint: when a block has multiple successor blocks, and two or more of these successors contain the same pure operation, we hoist it.

linuxlonelyeagle · November 28, 2025, 4:09pm

If there are two or more edges, we hoist the same pure op to both dominate their positions. What do you think of doing this? @ftynse @mehdi_amini @matthias-springer .

mehdi_amini · November 29, 2025, 10:15am

I would think CSE should be the transformation that does that, it would be interesting to see why it does not.

linuxlonelyeagle · December 3, 2025, 11:52am

I have studied the code within it.

The implementation of CSE uses llvm:: ScopedHashTable.
If CFG is compared to a tree. Leaf nodes can access the SSA values in the path of the root node, but they should not be able to access the SSA values of other leaf nodes.
The reason for this is that it should be a pre order traversal of moduleOp, creating a new llvm::ScopedHashTableScope when moving from one CFG edge to another.
At the leaf node, it will search for an op that is the same as the root node to the leaf node, and then replace it with the existing op.
The reason for doing this can be seen from IR above.
%cast cann’t be visited by %cast1’s block.
%cast1 cann’t be visited by %cast’s block.

func.func @hoist_cast_pos_alloc(%arg: i1) -> (memref<?xf32>) {
  %alloc = memref.alloc() : memref<10xf32>
  cf.cond_br %arg, ^bb1, ^bb2
^bb1:
  %cast = memref.cast %alloc : memref<10xf32> to memref<?xf32>
  return %cast : memref<?xf32>
^bb2:
  %cast1 = memref.cast %alloc : memref<10xf32> to memref<?xf32>
  return %cast1 : memref<?xf32> 
}

linuxlonelyeagle · December 10, 2025, 9:20am

I would like to further hear your thoughts.

mehdi_amini · December 10, 2025, 4:40pm

Does LLVM CSE has the same limitation? Do you think it is intrinsic to CSE or just something to improve in MLIR CSE implementation?

linuxlonelyeagle · December 10, 2025, 5:05pm

I will conduct research on it, which may take some time (as I do not have experience using LLVM IR), But it’s quite interesting, this question is turning from a small one to an interesting one

linuxlonelyeagle · December 11, 2025, 9:39am

I have conducted experiments on it. You can see. LLVM’s behavior is consistent with MLIR .

define i32 @test_cse(i1 %cond, i32 %arg0, i32 %arg1) {
entry:
  br i1 %cond, label %if.true, label %if.false

if.true:                                          ; preds = %entry
  %a = add i32 %arg0, %arg1
  %b = mul i32 %a, %arg0
  ret i32 %b

if.false:                                         ; preds = %entry
  %c = add i32 %arg0, %arg1
  %d = mul i32 %c, %arg0
  ret i32 %d
}

// opt -S test.ll -passes=early-cse
define i32 @test_cse(i1 %cond, i32 %arg0, i32 %arg1) {
entry:
  br i1 %cond, label %if.true, label %if.false

if.true:                                          ; preds = %entry
  %a = add i32 %arg0, %arg1
  %b = mul i32 %a, %arg0
  ret i32 %b

if.false:                                         ; preds = %entry
  %c = add i32 %arg0, %arg1
  %d = mul i32 %c, %arg0
  ret i32 %d
}

If we make it this.

define i32 @test_cse(i1 %cond, i32 %arg0, i32 %arg1) {
entry:
  %a = add i32 %arg0, %arg1
  %b = mul i32 %a, %arg0
  br i1 %cond, label %if.true, label %if.false
if.true:                                          ; preds = %entry
  ret i32 %b

if.false:                                         ; preds = %entry
  %c = add i32 %arg0, %arg1
  %d = mul i32 %c, %arg0
  ret i32 %d
}

// opt -S test.ll -passes=early-cse
define i32 @test_cse(i1 %cond, i32 %arg0, i32 %arg1) {
entry:
  %a = add i32 %arg0, %arg1
  %b = mul i32 %a, %arg0
  br i1 %cond, label %if.true, label %if.false

if.true:                                          ; preds = %entry
  ret i32 %b

if.false:                                         ; preds = %entry
  ret i32 %b
}

linuxlonelyeagle · December 17, 2025, 2:28pm

I hope to receive your further thoughts on this issue.

linuxlonelyeagle · February 15, 2026, 3:54pm

I’ve recently researched and implemented it as an extension to the CSE pass in the [mlir][CSE] Introduce hoist-pure-ops logic to CSE pass by linuxlonelyeagle · Pull Request #180556 · llvm/llvm-project · GitHub

It now supports CSE across multiple blocks within a regionOp by default, as mentioned above.

func.func @hoist_cast_pos_alloc(%arg: i1) -> (memref<?xf32>) {
  %alloc = memref.alloc() : memref<10xf32>
  cf.cond_br %arg, ^bb1, ^bb2
^bb1:
  %cast = memref.cast %alloc : memref<10xf32> to memref<?xf32>
  return %cast : memref<?xf32>
^bb2:
  %cast1 = memref.cast %alloc : memref<10xf32> to memref<?xf32>
  return %cast1 : memref<?xf32> 
}

// it will be
func.func @hoist_cast_pos_alloc(%arg: i1) -> (memref<?xf32>) {
  %alloc = memref.alloc() : memref<10xf32> 
  %cast = memref.cast %alloc : memref<10xf32> to memref<?xf32>
  cf.cond_br %arg, ^bb1, ^bb2
^bb1:
  return %cast : memref<?xf32>
^bb2:
  return %cast : memref<?xf32> 
}

Hoisting across region boundaries
I add HoistingContainerOpInterface to mlir. This interface models whether an operation’s regions are capable of acting as a container for operations hoisted from nested regions.
for example, scf.for now implements the HoistingContainerOpInterface.

scf.for xxx {
  if (xxx) {
     pureOp
  }
  if (xxx) {
     pureOp
  }
}

// after cse, it will be
scf.for xxx {
  pureOp
  if (xxx) {
  }
  if (xxx) {
  }
}

An open question: Should we introduce an option to disable this optimization if needed? Just performing a “simple CSE”.

I have implemented the features mentioned above and would appreciate it if you could review the code

szakharin · February 25, 2026, 4:49pm

FWIW, --passes=gvn-hoist will optimize the first example above.

Topic		Replies	Views
Make hlfir.forall can contain PureOp Flang	5	139	March 2, 2026
[RFC] Prevent CSE from removing expressions inside some non-`IsolatedFromAbove` operation regions MLIR	22	867	September 12, 2023
Bug in `OperationEquivalence` (breaks `-cse` on `linalg.index`) MLIR	49	2121	February 18, 2026
Will ops without side effects be reordered when running the pass? MLIR	5	225	March 16, 2025
RFC: Make Outlinable OpenMP Operations IsolatedFromAbove Flang mlir , openmp	16	241	May 7, 2026

[RFC][MLIR] Introduce hoist-pure-ops pass

Abstract

motivation

code implementation

Related topics