Conservative folding of to_tensor(to_memref)

I have to_tensor(to_memref) pairs in my IR that are not removed by the canonicalizer because they don’t appear one right after the other.
I see the fold function is purposely conservative.

OpFoldResult ToTensorOp::fold(FoldAdaptor) {
  if (auto toMemref = getMemref().getDefiningOp<ToMemrefOp>())
    // Approximate alias analysis by conservatively folding only when no there
    // is no interleaved operation.
    if (toMemref->getBlock() == this->getOperation()->getBlock() &&
        toMemref->getNextNode() == this->getOperation())
      return toMemref.getTensor();
  return {};

I am trying to understand why.
Does someone have an example that emphasizes this constraint?

Since the result of a to_memref op can’t be mutated, I’m not sure what the reason either.

That said when I wonder about why some line of code exists in any software project, I remove it and run the tests :slight_smile:
In this case there are quite an number of failures:

  MLIR :: Dialect/SCF/canonicalize.mlir
  MLIR :: Dialect/SparseTensor/dense.mlir
  MLIR :: Dialect/SparseTensor/sorted_coo.mlir
  MLIR :: Dialect/SparseTensor/sparse_1d.mlir
  MLIR :: Dialect/SparseTensor/sparse_2d.mlir
  MLIR :: Dialect/SparseTensor/sparse_3d.mlir
  MLIR :: Dialect/SparseTensor/sparse_affine.mlir
  MLIR :: Dialect/SparseTensor/sparse_fp_ops.mlir
  MLIR :: Dialect/SparseTensor/sparse_int_ops.mlir
  MLIR :: Dialect/SparseTensor/sparse_kernels.mlir
  MLIR :: Dialect/SparseTensor/sparse_lower.mlir
  MLIR :: Dialect/SparseTensor/sparse_lower_col.mlir
  MLIR :: Dialect/SparseTensor/sparse_lower_inplace.mlir
  MLIR :: Dialect/SparseTensor/sparse_nd.mlir
  MLIR :: Dialect/SparseTensor/sparse_outbuf.mlir
  MLIR :: Dialect/SparseTensor/sparse_parallel_reduce.mlir
  MLIR :: Dialect/SparseTensor/sparse_perm.mlir
  MLIR :: Dialect/SparseTensor/sparse_perm_lower.mlir
  MLIR :: Dialect/SparseTensor/sparse_scalars.mlir
  MLIR :: Dialect/SparseTensor/sparse_sddmm.mlir
  MLIR :: Dialect/SparseTensor/sparse_vector_chain.mlir
  MLIR :: Dialect/SparseTensor/sparse_vector_index.mlir
  MLIR :: Dialect/SparseTensor/vectorize_reduction.mlir

Now it may just be a matter of updating the tests…

@pifon2a ?

I would assume the reason is because one wants to be careful in the case of modifications to the data in between the two operations. For example (this might not be valid IR, as I copied pieces of IR and simplified the displayed memref type):

%mem = bufferization.to_memref %tensor0 : memref<4x?xf32> %val, %mem[0, 1023] : memref<4x?xf32>
%tensor1 = bufferization.to_tensor %mem : memref<4x?xf32>

I would naively expect this to then result in %tensor1 having the updated version of %tensor0 which would not happen with the fold. However… Look at the message before me because Mehdi beat me to the why that reasoning doesn’t hold.

Most likely this was due to being uncertain about exact design choices, a change in design over time, or fear over people like me not expecting that to_memref cannot be modified (this last possibility would lead to a larger debate over if that’s a valid reason, so I’m not saying it’s a good reason; merely a possibility).

Note, that mutating the result of the to_memref operation leads to undefined behavior.

This documentation is indeed misleading. The bufferization may generate to_tensor/to_memref ops. The result of a to_memref may be fed into some op that writes to this location. That is fine because the entire code was generated by the bufferization.

The IR that Tres showed could be the result of the bufferization.

What is forbidden: Modifying the IR after bufferization such that there is a new write into the result of a bufferization.to_memref (or an alias thereof, e.g., the result of memref.subview(bufferization.to_memref)). This could violate some assumptions that were made by the analysis.

to_memref/to_tensor were originally needed for the dialect conversion based bufferization passes. We don’t need them anymore in One-Shot Bufferize. They could be replaced with unrealized_conversion_cast.

Note that these ops are visible only if there are ops that could not be bufferized. By default, One-Shot Bufferize fails when it sees an unknown tensor op (unless allow-unknown-ops). A not-yet-bufferized op could later be bufferized as follows:

  • If the a tensor operand bufferizes to a memory write, a buffer copy must be inserted.
  • If the op has a tensor result, a buffer copy must be inserted for each tensor operand.

When we bufferize like this, we can just fold unrealized_conversion_casts. No special rules such as mutating the result of ... is undefined behavior are needed.

Thank you all for answering!
I want to make sure I get it right.
The confusing cases are the ones where the ops between to_memref and to_tensor are modifying the returned memref.

  1. The simple case where the result of to_memref has no uses (besides the to_tensor/other to_tensors) can be folded.

  2. The case where the ops between them are not writing to memory can also be folded.

  3. The confusing case which I need to fully understand:

The IR that Tres showed could be the result of the bufferization

In that case, does it mean the following to_tensor refers to the original %tensor0 or to the updated value?

Thanks for clarifying, can you update the doc to reflect this?

  %0 = bufferization.to_memref %arg0 : memref<i64> %c0, %0[] : memref<i64>
  %tensor = bufferization.to_tensor %0 : memref<i64>

Here is an example, we can’t fold the to_tensor because the memref was written to and modified.

Thanks, your answers helped me understand the different cases and how the folding can be extended.
I uploaded a patch that folds to_tensor(to_memref) if there are no interleaved users of the memref ⚙ D142195 [mlir][Bufferization] Extend the folding of to_tensor(to_memref)

1 Like