RFC: Introduce a PyTACO-MLIR Bridge for integration testing

As announced at the end of the Sparse Tensors in MLIR thread, we have many exciting new sparse tensor features and improvements planned for 2022, and this is the first topical post on such a feature: adding a PyTACO-MLIR bridge for integration testing.

A lot of the new sparse tensor support in MLIR borrows heavily from the great foundation laid by the TACO project. One nifty feature of TACO is PyTACO, which is an extension to Python that provides easy access to sparsity annotations by means of tensor index notation. For example, a sparse matrix in CSR format times dense vector kernel can be expressed in PyTACO as follows.

import pytaco as pt
...
csr = pt.format([dense, compressed])
A = pt.read("pwtk.mtx", csr)
...
i, j = pt.get_index_vars(2)
...  
y[i] = A[i, j] * x[j] + z[i]

Thanks to excellent work by Bixia Zheng, we now can run PyTACO programs using MLIR by essentially just changing the import line into

import mlir_pytaco_api as pt

After that, a new bridge lowers PyTACO expressions into Linalg expressions and subsequently runnable code using the sparse compiler in MLIR. A JIT compilation using LLVM IR takes care of evaluating the tensor index expression for the Python environment after that.

Since the bridge is currently merely intended to simplify integration testing (writing Python is so much easier than writing direct MLIR code), and is not “production ready” yet, all code related to the new bridge will reside in the following directory.

mlir/test/Integration/Dialect/SparseTensor/taco

Please let us know your thoughts on this new functionality. We will have a revision for review up on Phabricator real soon.

4 Likes

Here is Bixia’s revision: D117260 Upstream MLIR PyTACO implementation

It’s pretty nice to see what we can already achieve with MLIR without writing a single line of C++! Well done @bixia1 :slight_smile:

I like that this patch is not intrusive and can be living in just the “integration testing” directory right now, which makes it really not load-bearing: this LGTM right now.

1 Like

I think everyone is silently excited and waiting for this to land, seems like we can move forward with it at this point!

1 Like

Hey folks,

I am thinking of reusing some functions defined in test/Integration/Dialect/SparseTensor/taco/tools, in particular pt.from_tensor and pt.from_array in benchmarks. My objective is to compare mlir sparse tensor vs numpy vs pytaco.

Would it make sense if we move the tools directory to somewhere which would be accessible to both tests and benchmarks. Or should I create my own utilities in benchmark and use them?

Thanks,
Saurabh

My first reaction is of course always that we should avoid code duplication as much as possible and move shared functionality to a single common place.

However, in this case, for example, pt.from_array is part of the PyTACO api and I don’t think we should move such constructs out, since I would like the SparseTensor/taco directory to contain a self-contained implementation of PyTACO. So in this case, I am more inclined to suggest making a new tools directory and determine later what functionality could be shared at a better place.

But others may have different opinions. For example, @bixia1 do you have any preference?

I agreed with @aartbik that py.from_array is part of the PyTACO implementation, and you can’t use it without using the rest of the PyTACO source code. I am fine with moving test/Integration/Dialect/SparseTensor/taco/tools/ up to make it obvious that it can be shared by both test and benchmark. Would mlir/tools/taco be a proper place?

I am fine with moving test/Integration/Dialect/SparseTensor/taco/tools/ up to make it obvious that it can be shared by both test and benchmark. Would mlir/tools/taco be a proper place?

That sounds good. I can create a patch to move around these files unless anyone has any objections.

Given what is already in mlir/tools/*, I doubt this is the right place to put Python support methods, even when placed inside a subdirectory. I think it is better if you proceed with the original suggestion on making a tools directory placed under benchmarks and implement methods that you really need there first, even if there is some code overlap. Then, if we find that there is strong commonality, we can put forward a formal RFC and find a better landing place inside the mlir tree.

1 Like

I think it is better if you proceed with the original suggestion on making a tools directory placed under benchmarks and implement methods that you really need there first, even if there is some code overlap. Then, if we find that there is strong commonality, we can put forward a formal RFC and find a better landing place inside the mlir tree.

Sounds good! I’ll do that.