Does MemAlloc effect allow reordering?

szakharin · December 15, 2025, 6:06pm

Hello,

I wonder if MemAlloc effect is supposed to allow or disallow reordering by definition. For example, Flang uses the MemoryEffects on a special DebuggingResource to keep the nesting structure of fir.dummy_scope operations: llvm-project/flang/include/flang/Optimizer/Dialect/FIROps.td at e53acac022892b58a1576ad9eebe2ccdda407dda · llvm/llvm-project · GitHub

We currently use MemWrite effect, but it will certainly block some analyses/optimizations. May I use MemAlloc effect (which is generally more optimizable) instead and be sure that two fir.dummy_scope operations won’t be reordered by some optimization?

Alternatively, does it make sense to add a core MLIR “debugging” resource that might be handled by some analyses in a special way (e.g. mlir::affine::isLoopMemoryParallel may assume that MemWrite effects to such resource are parallelizable)?

mehdi_amini · December 15, 2025, 6:58pm

No you can’t: two alloc can be reordered.

That does seem quite ad-hoc to me actually. Can you elaborate with some IR snippet showing what you’re trying to achieve?

szakharin · December 15, 2025, 9:05pm

Thanks!

I am investigating the feasibility of using Affine dialect and transformations in Flang. One of the aspects is the ability to generate debug and TBAA information for Fortran programs, which is currently done pretty late in Flang pass pipeline. In order to preserve the source level information, Flang uses certain FIR operations like fir.declare and fir.dummy_scope. If I want to apply the Affine transformations “in the middle” of Flang pass pipeline, I may have MLIR like this:

// RUN: fir-opt %s -allow-unregistered-dialect -affine-parallelize

func.func @_QPtest1(%arg0 : memref<10xf32>) {
  %cst = arith.constant 1.000000e+00 : f32
  affine.for %arg2 = 0 to 10 {
    %16 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg2)
    %alloca_0 = memref.alloca() : memref<f32>
    %17 = fir.convert %alloca_0 : (memref<f32>) -> !fir.ref<f32>
    %18 = fir.dummy_scope : !fir.dscope
    %20 = fir.declare %17 dummy_scope %18 arg 1 {uniq_name = "_QFtestFinnerEy"} : (!fir.ref<f32>, !fir.dscope) -> !fir.ref<f32>
    %21 = fir.convert %20 : (!fir.ref<f32>) -> memref<f32>
    affine.store %cst, %21[] : memref<f32>
    %22 = affine.load %21[] : memref<f32>
    affine.store %22, %arg0[%16 - 1] : memref<10xf32>
  }
  return
}

func.func @_QPtest2(%arg0 : memref<10xf32>) {
  %cst = arith.constant 1.000000e+00 : f32
  affine.for %arg2 = 0 to 10 {
    %16 = affine.apply affine_map<(d0) -> (d0 + 1)>(%arg2)
    %alloca_0 = memref.alloca() : memref<f32>
    %17 = fir.convert %alloca_0 : (memref<f32>) -> !fir.ref<f32>
    %20 = fir.declare %17 {uniq_name = "_QFtestFinnerEy"} : (!fir.ref<f32>) -> !fir.ref<f32>
    %21 = fir.convert %20 : (!fir.ref<f32>) -> memref<f32>
    affine.store %cst, %21[] : memref<f32>
    %22 = affine.load %21[] : memref<f32>
    affine.store %22, %arg0[%16 - 1] : memref<10xf32>
  }
  return
}

In test1 I show potential MLIR mixing FIR/affine dialect operations - you may see the fir.dummy_scope in this example. Such code may appear due to MLIR inlining or due to OpenACC private variables materialization early in Flang FE or due to other reasons.

In test2 I manually removed fir.dummy_scope (i.e. I lost some information due to this).

I tested them using my modified fir-opt tool (with registered Affine passes, and ViewLikeOpInterface attached to fir.declare operation): -affine-parallelize can parallelize the loop in test2 but not in test1, because fir.dummy_scope has a MemWrite effect on FIR’s DebuggingResource. As I said before, DebuggingResource is used to guarantee fir.dummy_scope nesting (in the case of MLIR inlining), but it is just an aritificial “metadata” resource and it should not restrict parallelization in any way.

Output MLIR with my modified fir-opt:

#map = affine_map<(d0) -> (d0 + 1)>
module {
  func.func @_QPtest1(%arg0: memref<10xf32>) {
    %cst = arith.constant 1.000000e+00 : f32
    affine.for %arg1 = 0 to 10 {
      %0 = affine.apply #map(%arg1)
      %alloca = memref.alloca() : memref<f32>
      %1 = fir.convert %alloca : (memref<f32>) -> !fir.ref<f32>
      %2 = fir.dummy_scope : !fir.dscope
      %3 = fir.declare %1 dummy_scope %2 arg 1 {uniq_name = "_QFtestFinnerEy"} : (!fir.ref<f32>, !fir.dscope) -> !fir.ref<f32>
      %4 = fir.convert %3 : (!fir.ref<f32>) -> memref<f32>
      affine.store %cst, %4[] : memref<f32>
      %5 = affine.load %4[] : memref<f32>
      affine.store %5, %arg0[%0 - 1] : memref<10xf32>
    }
    return
  }
  func.func @_QPtest2(%arg0: memref<10xf32>) {
    %cst = arith.constant 1.000000e+00 : f32
    affine.parallel (%arg1) = (0) to (10) {
      %0 = affine.apply #map(%arg1)
      %alloca = memref.alloca() : memref<f32>
      %1 = fir.convert %alloca : (memref<f32>) -> !fir.ref<f32>
      %2 = fir.declare %1 {uniq_name = "_QFtestFinnerEy"} : (!fir.ref<f32>) -> !fir.ref<f32>
      %3 = fir.convert %2 : (!fir.ref<f32>) -> memref<f32>
      affine.store %cst, %3[] : memref<f32>
      %4 = affine.load %3[] : memref<f32>
      affine.store %4, %arg0[%0 - 1] : memref<10xf32>
    }
    return
  }
}

szakharin · December 15, 2025, 9:12pm

To add to that, I think other MLIR dialects may also use a special “debugging” resource to allow some optimizations, e.g. llvm.dbg.declare intrinsic should probably not block all optimizations due to conservative side-effects.

I agree with you that it may look quite ad-hoc. What might be the other options?

szakharin · January 27, 2026, 8:55pm

I made an attempt to resolve some issues in MLIR optimizations and alias analysis for operations that access such synthetic resources.

Please feel free to leave comments in [RFC][mlir] Introduced unit SideEffects::Resource. by vzakhari · Pull Request #178291 · llvm/llvm-project · GitHub

Topic		Replies	Views
[RFC][MLIR] Memory region hierarchy for MLIR Side Effects MLIR rfc , effects , side-effects , memory-effects	47	775	March 21, 2026
[RFC] Compile-time memref.alloc Scheduling/Merging optimization MLIR	24	1615	July 2, 2025
Linalg ops prevent affine loop fusion MLIR	30	643	March 13, 2025
How do we compare execution order of two operations? MLIR	11	637	May 23, 2023
[RFC] Extending Memory Effects and Improving Loop Invariant Code Motion via a New MemInit Effect MLIR mlir	52	1136	February 26, 2026

Does MemAlloc effect allow reordering?

Related topics