[RFC] split the `tensor` dialect from `std`

We currently have a canonicalization that converts load(tensor_to_memref(x)) -> tensor.extract(x) link.

Personally I think that’s a buggy pattern since it “fights” with bufferization and I would support just deleting it. We discussed specifically not adding “anti-bufferization” canonicalizations in D90709, but somehow it snuck in and I don’t have the energy to unwind it.

In my view this does not fight with bufferization. The pattern is very useful to propagate extraction of scalar values across a partial bufferization barrier. As the extracted result is a single element value, there is also no bufferization “undone”.

We need to more clearly define what patterns we allow and what we consider as “undoing bufferization”. I don’t think a discussion in a single code review suffices for this. In pipelines that optimize the code in partially bufferized form (like kernel generator does, where shape computations are not bufferized but their uses might be), patterns like the one mentioned here, are important, so that other canonicalizations can apply.

I am happy with tensor_load and tensor_store to live in the memref dialect. However, I would not use the for bufferization and add the required cast operations to a separate dialect. This would also give as a home for the canonicalization patterns we want in the context of partial bufferization.

I have not watched the rest of the generic cast discussion, yet, but specific canonicalizations might be another reason to use custom cast operations for bufferization.

I also don’t think that this approach fundamentally “fights with bufferization”. As @herhut outlined above, it might be particularly interesting for “partially bufferized environments” in order to apply further canonicalizations afterwards. Furthermore, I also think that we need to be much more specific when talking about “undoing bufferization” in general.

My definition of “undoing”: If I run foo-bufferize and then some transformation X, do I need to re-run foo-bufferize to finish bufferizing the program? If I do, then the transformation X “undoes” bufferization.

If we bufferize the tensor dialect, tensor.extract turns into tensor_to_memref + load. Then the canonicalization we are discussing undoes that by reintroducing tensor.extract. This then requires bufferizing the tensor dialect again. It is therefore “undoing” bufferization by the definition above.

I’m not debating the goal which this pattern is currently being used to achieve. But surely we all agree that it undoes bufferization according to the definition above (which seems like a reasonable definition).

It seems to me that the solution for downstream users is to just run tensor-bufferize later in their pipeline, instead of selectively undoing it somewhere in the middle of their pipeline. Another option would be to use populate*Patterns and set up legality such that tensor.extract doesn’t get bufferized in the first place.

I had a different definition in mind where undoing bufferization means removing allocations, i.e., turning producers of memrefs back into producers of tensors and hence changing the allocation profile of the program. With your definition, I agree this undoes bufferization. Which definition is more useful is up for debate.

This would be the case if the load(tensor_to_memref(v)) pattern would only arise from the bufferization of the tensor dialect. It does not. This also happens when bufferizing and lowering any dialect that reads from memrefs where the inputs are not fully bufferized, yet. In our use case, this is the lmhlo dialect reading from shape values. Running tensor-bufferize later does not solve this.

We can of course take these patterns downstream.

Hey folks, there have been some backchannel conversations about this that I wanted to summarize:

  1. The current approach of having std depend on tensor was a well-intentioned, but ultimately not desirable attempt to keep tensor “lean and mean” and exactly how it would look in the future. In fact, having tensor depend on std seems to make more sense in practice (this is the approach that was taken for memref, and it seems to be the desirable intermediate state dependency-direction-wise).
  2. The fact that subtensor/subtensor_insert are not in tensor is an artifact of 1., as those ops have various associated code that creates scalar constants and thus needs std.constant in today’s world. And the inability to have a place for sharing code with memref was another aspect IIRC.

I’m 100% supportive of reversing that dependency edge and making tensor depend on std. That avoids the current brittle state until int.constant and float.constant can be factored out for associated folding/code in tensor.

2 Likes