Facing issues with bufferization cloneOp

Hello, I am running into some issues with the bufferization cloneOp, specifically for dealloc optimization.

The following is the initial module

module {
  memref.global "private" constant @shape : memref<2xindex> = dense<[1, 2]>
  func.func @forward(%arg0: memref<2x2xf32>, %arg1: memref<2x2xf32>) -> memref<2xf32> {
    %cst = arith.constant 0.000000e+00 : f32
    %alloc = memref.alloc() : memref<2x2xf32>
    linalg.fill ins(%cst : f32) outs(%alloc : memref<2x2xf32>)
    linalg.matmul ins(%arg0, %arg1 : memref<2x2xf32>, memref<2x2xf32>) outs(%alloc : memref<2x2xf32>)
    %alloc_0 = memref.alloc() : memref<2xf32>
    linalg.fill ins(%cst : f32) outs(%alloc_0 : memref<2xf32>)
    linalg.reduce ins(%alloc : memref<2x2xf32>) outs(%alloc_0 : memref<2xf32>) dimensions = [0] 
      (%in: f32, %init: f32) {
        %1 = arith.addf %in, %init : f32
        linalg.yield %1 : f32
      }
    %0 = memref.get_global @shape : memref<2xindex>
    %reshape = memref.reshape %alloc_0(%0) : (memref<2xf32>, memref<2xindex>) -> memref<1x2xf32>
    %collapse_shape = memref.collapse_shape %reshape [[0, 1]] : memref<1x2xf32> into memref<2xf32>
    return %collapse_shape : memref<2xf32>
  }
}

I run the following passes on the module above

  • mlir::bufferization::createOwnershipBasedBufferDeallocationPass [Passes - MLIR]
  • mlir::createCanonicalizerPass
  • mlir::bufferization::createBufferDeallocationSimplificationPass [Passes - MLIR]

This is the final module that I produce

module {
  memref.global "private" constant @shape : memref<2xindex> = dense<[1, 2]>
  func.func @forward(%arg0: memref<2x2xf32>, %arg1: memref<2x2xf32>) -> memref<2xf32> {
    %true = arith.constant true
    %cst = arith.constant 0.000000e+00 : f32
    %alloc = memref.alloc() : memref<2x2xf32>
    linalg.fill ins(%cst : f32) outs(%alloc : memref<2x2xf32>)
    linalg.matmul ins(%arg0, %arg1 : memref<2x2xf32>, memref<2x2xf32>) outs(%alloc : memref<2x2xf32>)
    %alloc_0 = memref.alloc() : memref<2xf32>
    linalg.fill ins(%cst : f32) outs(%alloc_0 : memref<2xf32>)
    linalg.reduce ins(%alloc : memref<2x2xf32>) outs(%alloc_0 : memref<2xf32>) dimensions = [0] 
      (%in: f32, %init: f32) {
        %2 = arith.addf %in, %init : f32
        linalg.yield %2 : f32
      }
    %0 = memref.get_global @shape : memref<2xindex>
    %reshape = memref.reshape %alloc_0(%0) : (memref<2xf32>, memref<2xindex>) -> memref<1x2xf32>
    %collapse_shape = memref.collapse_shape %reshape [[0, 1]] : memref<1x2xf32> into memref<2xf32>
    %1 = bufferization.clone %collapse_shape : memref<2xf32> to memref<2xf32>
    bufferization.dealloc (%alloc : memref<2x2xf32>) if (%true)
    bufferization.dealloc (%alloc_0 : memref<2xf32>) if (%true)
    return %1 : memref<2xf32>
  }
}

The issue I am facing here is with the bufferization.clone operation ['bufferization' Dialect - MLIR]. After running these passes, I want to optimize our deallocs to save memory for larger IR cases (i.e. move the deallocs earlier in the program, instead of at the very end). Based on the definition of bufferization.clone, it can either be a copy or an alias - this ambiguity is giving me issues for the dealloc optimization. Is there a better way to distinguish when bufferization.clone does a copy vs using an alias?

Thank you!

1 Like