Strange behaviour of `-buffer-deallocation-simplification` pass

Hi all, I am trying to work on [mlir] `mergeIdenticalBlocks` is blowing up on a conditional to identical CFGs · Issue #63230 · llvm/llvm-project · GitHub.

The optimization can be stated as: “remove a block argument if all the predecessors are passing the same value for that argument”

I implemented it, but I get some issues in -buffer-deallocation-simplification. I can give more details, but at some point the IR looks like the following:

module {
  func.func @condBranchDynamicTypeNested(%arg0: i1, %arg1: memref<?xf32>, %arg2: memref<?xf32>, %arg3: index) {
    %true = arith.constant true
    %false = arith.constant false
    %0 = arith.select %arg0, %false, %false : i1
    %1 = arith.select %arg0, %false, %false : i1
    cf.cond_br %arg0, ^bb1, ^bb2
  ^bb1:  // pred: ^bb0
    cf.br ^bb5(%arg1, %true : memref<?xf32>, i1)
  ^bb2:  // pred: ^bb0
    %alloc = memref.alloc(%arg3) : memref<?xf32>
// If I remove the following two lines the code works
    cf.br ^bb4(%true : i1)
  ^bb4(%10: i1):  // pred: ^bb3
    cf.br ^bb5(%alloc, %true : memref<?xf32>, i1)
  ^bb5(%11: memref<?xf32>, %12: i1):  // 2 preds: ^bb1, ^bb4
    %base_buffer_8, %offset_9, %sizes_10, %strides_11 = memref.extract_strided_metadata %arg2 : memref<?xf32> -> memref<f32>, index, index, index
    %base_buffer_12, %offset_13, %sizes_14, %strides_15 = memref.extract_strided_metadata %11 : memref<?xf32> -> memref<f32>, index, index, index
    %13:2 = bufferization.dealloc (%base_buffer_8, %base_buffer_12 : memref<f32>, memref<f32>) if (%1, %12) retain (%11, %arg2 : memref<?xf32>, memref<?xf32>)
    cf.br ^bb6(%true : i1)
  ^bb6(%14: i1):  // pred: ^bb5
    return
  }
}


And when I run: mlir-opt -buffer-deallocation-simplification bug.mlir I get:

mlir/lib/Dialect/Bufferization/Transforms/BufferViewFlowAnalysis.cpp:287: auto mlir::BufferOriginAnalysis::isSameAllocation(mlir::Value, mlir::Value)::(anonymous class)::operator()(const SmallPtrSet<mlir::Value, 16> &, SmallPtrSet<mlir::Value, 16> &, bool &, bool &) const: Assertion `!terminal.empty() && "expected non-empty terminal set"' failed.

The weird thing (for my inexperienced eyes) is that if I remove those two lines:

    cf.br ^bb4(%true : i1) 
  ^bb4(%10: i1):  // pred: ^bb3

The pass does not complain anymore. Is there someone that can help me understand what is going on?

I can show a more complete IR, if what I am providing is not enough.

Thanks in advance,
Giuseppe

EDIT: I copied/pasted the wrong IR. This should be the right example (cc @matthias-springer )

There is something wrong with this example.

a.mlir:17:3: error: redefinition of block '^bb4'
  ^bb4:  // pred: ^bb3

When I rename the block and branch to bb5, the pass does not complain.

Also note that unstructured control flow loops are not supported in the ownership-based buffer deallocation pass at the moment.
From invalid-buffer-deallocation.mlir:

// Test Case: explicit control-flow loop with a dynamically allocated buffer.
// The BufferDeallocation transformation should fail on this explicit
// control-flow loop since they are not supported.

// expected-error@+1 {{Only structured control-flow loops are supported}}

I don’t quite remember what’s missing, but I think it could be supported without too much work. @merhart

That being said, the failed assertion that you are seeing is not from the buffer deallocation pass, but from an analysis that it is utilizing and I would be curious to see a test case that triggers the bug.

Hi @matthias-springer ,
Thanks for the reply. Sorry, I messed up when I copied pasted the IR. I edited the original post and pasted the correct IR.

About this:

Also note that unstructured control flow loops are not supported in the ownership-based buffer deallocation pass at the moment.

Does it mean that the pass would fail for something like I showed (where for instance arg3 is read directly from bb0 and not passed via the block arguments)?

That being said, the failed assertion that you are seeing is not from the buffer deallocation pass, but from an analysis that it is utilizing and I would be curious to see a test case that triggers the bug.

I was getting the failure in of the test cases in mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-branchop-interface.mlir . In particular, this is the specific test that is giving me the assertion:

func.func @condBranchDynamicTypeNested(
  %arg0: i1,
  %arg1: memref<?xf32>,
  %arg2: memref<?xf32>,
  %arg3: index) {
  cf.cond_br %arg0, ^bb1, ^bb2(%arg3: index)
^bb1:
  cf.br ^bb6(%arg1 : memref<?xf32>)
^bb2(%0: index):
  %1 = memref.alloc(%0) : memref<?xf32>
  test.buffer_based in(%arg1: memref<?xf32>) out(%1: memref<?xf32>)
  cf.cond_br %arg0, ^bb3, ^bb4
^bb3:
  cf.br ^bb5(%1 : memref<?xf32>)
^bb4:
  cf.br ^bb5(%1 : memref<?xf32>)
^bb5(%2: memref<?xf32>):
  cf.br ^bb6(%2 : memref<?xf32>)
^bb6(%3: memref<?xf32>):
  cf.br ^bb7(%3 : memref<?xf32>)
^bb7(%4: memref<?xf32>):
  test.copy(%4, %arg2) : (memref<?xf32>, memref<?xf32>)
  return
}

For instance, my optimization finds that %arg3 does not have to be passed to bb2, but can be read directly from bb0.

If I run the specific example with (and with my optimization on):

mlir-opt  -ownership-based-buffer-deallocation original_example.mlir

I get an IR similar to the one I showed in my original post (i.e., bug.mlir is a simplified version of the problematic IR).Applying -buffer-deallocation-simplification on that IR gives me the failure

I tried to reproduce this using your example from the first post on my local MLIR installation (commit 3beb232fb4fd), but don’t get an error/assertion. Was something changed upstream in the meantime that triggers this issue?

If I remember correctly, it was working out-of-the-box, but produced far from optimal IR: there were a lot of unnecessary dynamic checks that the deallocation simplification pass also couldn’t optimize away. Here is my WIP commit from almost a year ago: [mlir][bufferization] BufferDeallocation: support unstructured contro… · maerhart/llvm-project@5563e54 · GitHub

No, in that case you would still not have a loop. If there is a backedge in the CFG, the deallocation pass will terminate with an error “Only structured control-flow loops are supported.”

Ok, you are right. It seems that the bug is only triggered by a combination of my optimization and the deallocation-simplification.

I will try to either push my PR, or find an IR that decouples the issue from my optimization.

Hi @merhart , @matthias-springer ,
I dug a bit into it, and I “think” that what is happening is the following:

  • The BufferDeallocationSimplificationPass is using a BufferViewFlowAnalysis to determine positions to place allocs/deallocs
  • The analysis is run as first thing, before running any of the rewrites
  • All the rewrites execute a region simplification step (if it is not disabled in the rewrite config)
  • The region simplification step includes my optimization to remove redundant block arguments. I think this invalidates the analysis (which is used to determine terminal values, I think)

Is that a possible reason? In this case, should we disable (or set it to normal) the region simplification step when running BufferDeallocationSimplificationPass. I think that those “simplifications” are clashing with each other.

Also, if this is the real reason behind the issue I am having, there could also be an hidden bug. Because I would imagine that also block merging might invalidate the BufferViewFlowAnalysis analysis (it’s just that we don’t test this case).

Thanks,
Giuseppe