Overcoming Sparsification Limitation on Level Types

I am looking into something that has troubled me regarding the level types for sparse tensors during the sparsification process of linalg.generic operators.

Specifically, I am attempting to implement the COO format in MLIR, which is represented in the TACO compiler like this:

taco "A(i,j)=B(i,k)*C(k,j)" -f=A:dd:0,1 -f=B:uq:0,1 -f=C:dd:0,1

Note that both A and C are dense/dense, and B has the level types of sparse non-unique/singleton. This generates appropriate code for iterating over the COO format. Mainly, group first-rank indexes, then iterate over each group using second rank indexes in the lookup for tensor C.

Doing the same thing in MLIR does not seem to work. Specifically, if I use the following IR:

#Tcs = #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "singleton"  ] }>

#trait2 = {
  indexing_maps = [
    affine_map<(i,j) -> (i,j)>,  // A
    affine_map<(i,j) -> (i,j)>,  // B
    affine_map<(i,j) -> (i,j)>   // X (out)
  ],
  iterator_types = ["parallel", "parallel"],
  doc = "X(i,j) = A(i,j) OP B(i,j)"
}

func.func @mul_cs(%arga: tensor<32x16xf32, #Tcs>, %argb: tensor<32x16xf32>, %argx: tensor<32x16xf32>) -> tensor<32x16xf32> {
  %0 = linalg.generic #trait2
     ins(%arga, %argb: tensor<32x16xf32, #Tcs>, tensor<32x16xf32>)
    outs(%argx: tensor<32x16xf32>) {
      ^bb(%a: f32, %b: f32, %x: f32):
        %0 = arith.mulf %a, %b : f32
        linalg.yield %0 : f32
  } -> tensor<32x16xf32>
  return %0 : tensor<32x16xf32>
}

I manage to crash mlir-opt when simply running:

mlir-opt -sparsification coo-test.mlir

The crash happens because the genInit function handles the kSparse level type, but assumes everything else is dense. It ignores the kSingle level type. This causes genSubscript to assert because the pidxs of the level is a nullptr. Clearly the singleton level type is not implemented yet.

The naïve solution here is to add an implementation of the singleton level type. But this raises a question for me: What happens when a new level type is added? For example, what if we want to add a block-sparse level type? Right now the Taco solution is to break the level into two and iterate first from block to block, then a nested iterator inside each block. This doesn’t seem feasible for MLIR sparsification without complicated maps, if at all.

Might we consider generalizing the iterators into a dialect that allows for lookup functions? These functions would be specified through attributes to the iterator operation that provide the name of the function to use. This way we could allow the IR writer to specify how the level is iterated.

Defaults could be provided for the dense and compressed level types to maintain the same code generation output that currently exists. With this generalization the MLIR code could remain generic enough to handle whatever madness is thrown at it with (hopefully) no modification to the MLIR source.

This of course makes transformations difficult if not impossible, but we’re talking sparse data structures. Haven’t we abandoned the hope of determinism?

What are your thoughts? Is something already in the works to remedy this issue? I have ideas that I’d like to implement but I want to check here first before I proceed.

Thanks for listening.

This is a known limitation of the current implementation. Though your program ought to generate a validation error (or at the very least an assertion failure) rather than actually crashing!

Singleton support isn’t too hard to add, but there hasn’t been much call for it (since it isn’t really used anywhere except for COO) so it keeps getting preempted by other work. Fwiw, the runtime library does have a bespoke implementation of COO which is more efficient than the TACO implementation; however, that class is only used as an intermediate representation when constructing/converting things, so it can’t be used directly as a sparse tensor<> itself.

As for other level-types, we are actively working on adding support for block sparsity and other things where a single semantic dimension of the tensor gets mapped to multiple levels in the storage representation. The approach I’m pursuing would allow a certain degree of other sorts of “reordering”. Though bear in mind that the iteration order is specified by the storage representation itself, since that’s required in order to generate efficient code; the “reordering” is just a question of how the semantic indices get mapped onto storage indices.

We are also considering generalizing the level types to allow users to define their own— like how TACO has a general level-function interface, with the various level-types as classes implementing that interface. However, this is lower priority than several other things we’re working on.

As Wren already explained, only the dimension level types dense and compressed are supported at the moment (and technically only unique for the latter). We obviously always planned to extend the dimension level types with many others proposed in the literature, including singleton.

In the meantime, I think it may be better to remove the singleton tag altogether for now, since it indeed falsely suggests we support it. I will send out a revision with this today. We can easily add it back once we are ready for the more general dimension level types.

This revision D131002 removes the singleton dimension level type for now, to remove the false impression that is it supported. We can add it back later if we truly want to support this (and other) dimension level type(s).

As a follow up, even though singleton is still not lowered to code, in revision D132897, we bring the singleton dimension level type back, but this time properly with the additional properties of ordered (default)/not-ordered and unique(default)/not-unique.

For now, we use a suffix to associated a non-default property with dimension level type (e.g. compressed-nu for compressed sorted and not-unique). This is similar to what TACO uses currently. In the long run, when we start supporting more dimension level types and properties (see [Chou2018] for a full list of suggested extension), we may separate the two somehow.

But at least, we can now define a few more sparse storage schemes, for example, the two COO variants shown below.

  // Vanilla COO
  #COO = #sparse_tensor.encoding<{
    dimLevelType = [ "compressed-nu-no", "singleton-no" ]
  }>
  // Sorted Coordinate Scheme.
  #SortedCOO = #sparse_tensor.encoding<{
    dimLevelType = [ "compressed-nu", "singleton" ]
  }>

  .. tensor<?x?xf64, #COO> ..
  .. tensor<?x?xf64, #SortedCOO> ..