[One-Shot Bufferizer] Bufferization fails in the presence of `bufferization.to_memref` and `bufferization.to_tensor`

Hi,

it seems that the one-shot bufferizer does not handle bufferization.to_tensor ops in the IR to be bufferized, e.g., as in the following IR:

module {
  func private @some_func_operating_on_memref(memref<2xf32>) -> ()

  func @main(%arg0: tensor<2xf32>) -> () {
    %m = bufferization.to_memref %arg0 : memref<2xf32>
    call @some_func_operating_on_memref(%m) : (memref<2xf32>) -> ()
    %t = bufferization.to_tensor %m : memref<2xf32>
    // Some more use of %t                                                                                                                                                                    
    return
  }
}

Invocation of the one-shot bufferizer with mlir-opt --one-shot-bufferize triggers an assertion:

void mlir::bufferization::BufferizationAliasInfo::bufferizeOutOfPlace(mlir::OpOperand&): Assertion `!inplaceBufferized.contains(&operand) && “OpOperand was already decided to bufferize inplace”’ failed.

Is this on purpose? Or should the presence of bufferize.to_tensor and bufferize.to_memref be allowed?

Thanks,
Andi

CC @matthias-springer

Yes, this is on purpose. to_tensor and to_memref are tricky because we cannot analyze tensor SSA use-def chains through these ops. Therefore, input IR that has these ops does currently not bufferize.

If possible, you could run One-Shot Bufferize first, then any other bufferization that you need afterwards. You can also specify a filter in BufferizationOptions to exclude certain ops from One-Shot Bufferize.

Alternatively, you could run One-Shot Bufferize without an analysis. However, that would mean that every op that is writing to a buffer will first make a copy of the buffer (alloc+copy), which is probably not what you want.

/// Bufferize `op` and its nested ops that implement `BufferizableOpInterface`.
/// Buffers are duplicated and copied before any tensor use that bufferizes to
/// a memory write.
///
/// Note: This function bufferizes ops without utilizing analysis results. It
/// can be used to implement partial bufferization passes.
LogicalResult bufferization::bufferizeOp(
    Operation *op, const BufferizationOptions &options);

We could also try to extend the analysis of One-Shot Bufferize to support to_tensor/to_memref, but it would likely have to be quite conservative and insert copies in many places. We did not have use cases for this until now, so I did not look into this much further.

Based on your IR example, it looks like you are bufferizing function boundaries with a different bufferization. Are you using --func-bufferize by any chance? If so, you could also try to One-Shot Bufferize with bufferize-function-boundaries. There are currently some limitations with this pass option (e.g., recursive/cyclic function calls are not supported), but it may be good enough for your use case.

Thanks for the quick reply.

Yes, this is on purpose. to_tensor and to_memref are tricky because we cannot analyze tensor SSA use-def chains through these ops. Therefore, input IR that has these ops does currently not bufferize.

That’s what I suspected. Thanks for clarifying!

If possible, you could run One-Shot Bufferize first, then any other bufferization that you need afterwards.

This is probably the only way forward for now: instead of calling external functions, preserve tensor semantics using some proxy operation, which then gets lowered to a function call with a memref upon bufferization.

You can also specify a filter in BufferizationOptions to exclude certain ops from One-Shot Bufferize.

I fear this won’t be applicable, since in our use-cases buffers may get used by operations which are subject to one-shot bufferization afterwards.

Alternatively, you could run One-Shot Bufferize without an analysis. However, that would mean that every op that is writing to a buffer will first make a copy of the buffer (alloc+copy), which is probably not what you want.

One goal of including the one-shot bufferizer into the pipeline is to reduce copies. Not using the analysis will probably end up worse or on-par with the current, naive bufferization.

We could also try to extend the analysis of One-Shot Bufferize to support to_tensor/to_memref, but it would likely have to be quite conservative and insert copies in many places. We did not have use cases for this until now, so I did not look into this much further.

This would be a quick win from our perspective, but I understand that this might be far from trivial for the general case.

Based on your IR example, it looks like you are bufferizing function boundaries with a different bufferization. Are you using --func-bufferize by any chance?

No, the IR contains calls to external functions which operate on memrefs (or rather bare pointers extracted from the memrefs). These functions are implemented in Rust and are compiled completely separately, and therefore remain opaque to the IR.

This is probably the only way forward for now: instead of calling external functions, preserve tensor semantics using some proxy operation, which then gets lowered to a function call with a memref upon bufferization.

I think this should work. In the implementation of BufferizableOpInterface you would then rewrite this op with the original CallOp on memrefs.

We could also try to extend the analysis of One-Shot Bufferize to support to_tensor/to_memref, but it would likely have to be quite conservative and insert copies in many places. We did not have use cases for this until now, so I did not look into this much further.

This would be a quick win from our perspective, but I understand that this might be far from trivial for the general case.

To be useful, we would have to extend to_tensor/to_memref ops a bit. E.g., let’s extend your above example a bit:

module {
  func private @some_func_operating_on_memref(memref<2xf32>) -> (memref<2xf32>)

  func @main() -> () {
    // ...
    %m = bufferization.to_memref %t0 : memref<2xf32>
    %m2 = call @some_func_operating_on_memref(%m) : (memref<2xf32>) -> (memref<2xf32>)
    %t1 = bufferization.to_tensor %m2 : memref<2xf32>
    // Some more use of %t1                                                                                                                                                                    
    return
  }
}

The bufferization has to know if %m and %m2 may be aliasing. Or put differently: Does @some_func_operating_on_memref return the same memref or a brand new one? This can affect the bufferization of ops that use %t0. These are the kind of problems we run into.