Bufferization.to_tensor/to_memref

Hi, folks. I’m looking for some advice on how to incorporate usages of tensor encodings with bufferization’s to_tensor and to_memref ops. these ops do not let me specify an encoding of the resultant tensor or input tensor respectively for the the ops.

I’d like to be able to do something like:

%1 = tensor.empty() : tensor<shape, encoding>
%2 = bufferation.to_memref %1 : tensor<shape, encoding> -> memref<shape>
%3 = bufferizatoin.to_tensor %2 : memref<shape> -> tensor<shape, encoding>

Currently i’m seeing to_memref fail to verify if the tensor has an encoding and to_tensor doesn’t have a way to specify the output (tensor) type.

I’m starting to look into if updating bufferization dialect is the right answer.

Thoughts?

There is currently no way to store encodings on a memref. What kind of information are you storing in the tensor encoding? And what’s the benefit of having a tensor encoding on bufferization.to_memref/to_tensor? The encoding would just be ignored by the bufferization framework. (Also, the bufferization somes creates new to_tensor ops and these would always be generated without an encoding in the result type.)

What kind of information are you storing in the tensor encoding?

My compiler is storing some information that will later become part of the memref’s memory space.

And what’s the benefit of having a tensor encoding on bufferization.to_memref /to_tensor ?

I’m not trying to do anything fancy here, i would just like these operations to honor the existing type graph. I have no issues with later passes inserting new conversions to/from tensors, My goal is to propagate information through the type graph.

I believe i can accomplish this by allowing the to/from tensor operations to specify the input and output types explicitly and allowing the types to match even if the buffer type contains an encoding. Would this change be welcomed upstream?

One problem is that the tensor encoding is currently used for sparse tensors. In such a case, a bufferization.to_memref may have to return multiple memrefs to make sense. I think that’s why we currently disallow tensor encodings.

I think we could allow the tensor encoding on these two ops. But we can’t really verify that the tensor encoding and the memref type are consistent.

Can you describe a bit how you are using to_tensor and to_memref?