I am looking into something that has troubled me regarding the level types for sparse tensors during the sparsification process of linalg.generic
operators.
Specifically, I am attempting to implement the COO format in MLIR, which is represented in the TACO compiler like this:
taco "A(i,j)=B(i,k)*C(k,j)" -f=A:dd:0,1 -f=B:uq:0,1 -f=C:dd:0,1
Note that both A and C are dense/dense, and B has the level types of sparse non-unique/singleton. This generates appropriate code for iterating over the COO format. Mainly, group first-rank indexes, then iterate over each group using second rank indexes in the lookup for tensor C.
Doing the same thing in MLIR does not seem to work. Specifically, if I use the following IR:
#Tcs = #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "singleton" ] }>
#trait2 = {
indexing_maps = [
affine_map<(i,j) -> (i,j)>, // A
affine_map<(i,j) -> (i,j)>, // B
affine_map<(i,j) -> (i,j)> // X (out)
],
iterator_types = ["parallel", "parallel"],
doc = "X(i,j) = A(i,j) OP B(i,j)"
}
func.func @mul_cs(%arga: tensor<32x16xf32, #Tcs>, %argb: tensor<32x16xf32>, %argx: tensor<32x16xf32>) -> tensor<32x16xf32> {
%0 = linalg.generic #trait2
ins(%arga, %argb: tensor<32x16xf32, #Tcs>, tensor<32x16xf32>)
outs(%argx: tensor<32x16xf32>) {
^bb(%a: f32, %b: f32, %x: f32):
%0 = arith.mulf %a, %b : f32
linalg.yield %0 : f32
} -> tensor<32x16xf32>
return %0 : tensor<32x16xf32>
}
I manage to crash mlir-opt when simply running:
mlir-opt -sparsification coo-test.mlir
The crash happens because the genInit
function handles the kSparse
level type, but assumes everything else is dense. It ignores the kSingle
level type. This causes genSubscript
to assert because the pidxs
of the level is a nullptr
. Clearly the singleton level type is not implemented yet.
The naïve solution here is to add an implementation of the singleton level type. But this raises a question for me: What happens when a new level type is added? For example, what if we want to add a block-sparse level type? Right now the Taco solution is to break the level into two and iterate first from block to block, then a nested iterator inside each block. This doesn’t seem feasible for MLIR sparsification without complicated maps, if at all.
Might we consider generalizing the iterators into a dialect that allows for lookup functions? These functions would be specified through attributes to the iterator operation that provide the name of the function to use. This way we could allow the IR writer to specify how the level is iterated.
Defaults could be provided for the dense and compressed level types to maintain the same code generation output that currently exists. With this generalization the MLIR code could remain generic enough to handle whatever madness is thrown at it with (hopefully) no modification to the MLIR source.
This of course makes transformations difficult if not impossible, but we’re talking sparse data structures. Haven’t we abandoned the hope of determinism?
What are your thoughts? Is something already in the works to remedy this issue? I have ideas that I’d like to implement but I want to check here first before I proceed.
Thanks for listening.