[RFC] Introduce a sparse tensor type to core MLIR

The proposal CL above would allow something like this to define a CSR scheme with 64-bit integers for pointers and indices using a named alias for sparse attribute values.

#CSR = {
  dim_order  = affine_map<(i,j) -> (i,j)>,
  dim_format = [ "D", "S" ],
  pointers = i64,
  indices = i64
}

with sparse tensor type

%a : tensor<?x?xf64, #CSR>

Likewise, a CSC scheme with 8-bit pointers and 16-bit indices would look like

#CSC = {
  dim_order  = affine_map<(i,j) -> (j,i)>,
  dim_format = [ "D", "S" ],
  pointers = i8,
  indices = i16
}

and

%b : tensor<?x?xf64, #CSC>

With the proposal above, trying to assign an %a value to a %b value would automatically be rejected by the type system of MLIR. Assigning such incompatible sparse tensor types to one another would require explicit “cast” operators, which eventually lower to sparse storage scheme conversion code.

There is some debate what the field name of the generic attribute of the tensor type should be. Prior suggestions were “layout” and “format”, but others prefer less specific names like “attr”, “metadata”, or “properties”. If you have a strong opinion on this, please let us know now!

And just to be very clear, this is only the field name, the actual value of the attribute will be something like a dictionary attribute where the key/value names and semantics are defined by a type/attribute interface (and can be extended in the future).

My two cents: I believe something generic is better than something too specific. The same “type attribute” strategy can be use to deal with things like quantized types or maybe even scalability (TBD), and something like “layout” is the wrong description for all those cases. Worst case scenario, how hard would it be to change it in the future?

Absolutely. The idea is that, although the first use case will be sparsity annotations, it may very well grow beyond that and contain all sorts of information.

Nothing is impossible, but the more code is added on top of this, the harder it becomes. So a generic name that we can all agree on now would be good. :slight_smile:

Fair enough :slight_smile: Probably “properties” is generic enough but not too generic. “Metadata” is also a good one.

One thing to consider here is that this attribute must be conservatively handled. So “metadata” tends (at least for LLVM folks) to suggest it is something ignorable, which is not what we want.

The reason I (very mildly – just trying to stimulate discussion here) don’t like properties is that it sounds like something that can be ignored if I don’t know about it. (at least attr sounds like an opaque thing I have to be conservative around).

Looking in a thesaurus, disposition and distinction seem possible candidates.

@clattner tends to have a good intuition for these naming things. Chris, what do you think the name of the attribute attached to tensors should be?

Based on Chris’ feedback, the current proposal is encoding. Going once, going twice, …

… Sold!

Next up: interfaces

1 Like

I’m following this with interest because I’m currently working on sparse implementations in JAX, which hopefully will someday be able to lower to these MLIR backends. I appreciate the example encodings corresponding to CSR and CSC; in general all 1D and 2D encodings seem pretty intuitive. But I’m having trouble understanding what some 3D encodings might imply in terms of the underlying storage implementation.

For example, what would the storage scheme look like for a DDS encoding?

%a : tensor<?x?x?xf32, #DDS>

DDS = {
  dim_order  = affine_map<(i,j,k) -> (i,j,k)>,
  dim_format = [ "D", "D", "S" ],
  pointers = i32,
  indices = i32
}

It seems like it must be some kind of second-order version of CSR, but I’m unsure. Thanks!

Yes, this would request dense storage in the first two indices, and compressed storage in the last, third and innermost dimension. Sometimes it is easier to look at an example.

Consider the 3-D tensor.

# extended FROSTT format
3 7
3 3 4
1 1 1  1.0
1 1 4  2.0
1 2 1  3.0
1 2 2  4.0
3 1 2  5.0
3 2 3  6.0
3 2 4  7.0

Then the annotation [ “D”, “D”, “S” ] would result in the following storage.

positions[2] = [ 0 2 4 4 4 4 4 5 7 7 ]
indices[2]  = [ 0 3 0 1 1 2 3]
values = [1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000 ]

And a very simple kernel like the following “x += A(i,j,k)” would look lowered to sparse code as follows (here, i and j are simple “dense” loops, but k iterates over a compressed format).

%0 = call @sparseDimSize(%arg0, %c0) 
%1 = call @sparseDimSize(%arg0, %c1)
%2 = call @sparsePointers64(%arg0, %c2)
%3 = call @sparseValuesF32(%arg0)
%4 = memref.buffer_cast %arg1 : memref<f32>
scf.for %arg2 = %c0 to %0 step %c1 {
  scf.for %arg3 = %c0 to %1 step %c1 {
    %6 = muli %1, %arg2 : index
    %7 = addi %6, %arg3 : index
    %8 = memref.load %2[%7] : memref<?xindex>
    %9 = addi %7, %c1 : index
    %10 = memref.load %2[%9] : memref<?xindex>
    %11 = memref.load %4[] : memref<f32>
    %12 = scf.for %arg4 = %8 to %10 step %c1
          iter_args(%arg5 = %11) -> (f32) {
      %13 = memref.load %3[%arg4] : memref<?xf32>
      %14 = addf %arg5, %13 : f32
      scf.yield %14 : f32
    }
    memref.store %12, %4[] : memref<f32>
  }
}

Hope that gives some more intuition.

Thank you - that’s helpful! If I’m reading this correctly, it appears that the storage scheme for the (3, 3, 4) tensor in this case is basically that of a standard CSR representation of the tensor flattened into a (9, 4) matrix. Is that a fair summary?

Searching through the forum history, I’m also finding your answer here helpful: MLIR Support for Sparse Tensors - #22 by aartbik

1 Like

Yes.

But of course, the power of a sparse compiler comes from generating all sorts of sparse storage schemes with “the push of a button” so to speak. If you don’t like the flattened-sparse form above, you can try [ “S”, “S”, “S” ], which will give you

positions[0] = [ 0 2 ]
indices[0] = [ 0 2 ]
positions[1] = [ 0 2 4 ]
indices[1] = [ 0 1 0 1 ]
positions[2] = [ 0 2 4 5 7 ]
indices[2] = [ 0 3 0 1 1 2 3 ]
values = [1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000 ]

and the code

%0 = call @sparsePointers64(%arg0, %c0)
%1 = call @sparsePointers64(%arg0, %c1)
%2 = call @sparsePointers64(%arg0, %c2)
%3 = call @sparseValuesF32(%arg0)
%4 = memref.buffer_cast %arg1 : memref<f32>
%5 = memref.load %0[%c0] : memref<?xindex>
%6 = memref.load %0[%c1] : memref<?xindex>
scf.for %arg2 = %5 to %6 step %c1 {
  %8 = memref.load %1[%arg2] : memref<?xindex>
  %9 = addi %arg2, %c1 : index
  %10 = memref.load %1[%9] : memref<?xindex>
  scf.for %arg3 = %8 to %10 step %c1 {
    %11 = memref.load %2[%arg3] : memref<?xindex>
    %12 = addi %arg3, %c1 : index
    %13 = memref.load %2[%12] : memref<?xindex>
    %14 = memref.load %4[] : memref<f32>
    %15 = scf.for %arg4 = %11 to %13 step %c1 iter_args(%arg5 = %14) -> (f32) {
      %16 = memref.load %3[%arg4] : memref<?xf32>
      %17 = addf %arg5, %16 : f32
      scf.yield %17 : f32
    }
    memref.store %15, %4[] : memref<f32>
  }
}

Still not happy? Let’s try ["D", "S", "S"] or some other combination. Once you are happy with the results, you can start fine tuning overhead storage (bit widths of pointer and indices can be controlled individually). Once you are happy with that result, you can fine tune vectorization and parallelization strategies.

So, in a nutshell, sparse tensor code development is no longer tedious, but fun!

1 Like

Indeed, I borrowed heavily from my own previous example in this answer :wink:

I think I’m wrapping my head around it… one last question: ["S", "D", "S"], for example, seems like it would require a sequence of positions[1] & indices[1] arrays, becuase the sparsity structure may differ for each indices[0] value. Is my mental model for that correct?

Dense dimensions never have a position or indices array associated with them, because the compiler knows the “stride” size of combined, consecutive dimensions. So for the example tensor, this would be stored as follows:

  positions[0] : [ 0 2 ]
  indices[0]  : [ 0 2 ]
  positions[2] : [ 0 2 4 4 5 7 7 ] 
  indices[2] : [ 0 3 0 1 1 2 3 ]
  values: [ 1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000 ]

With the following code.

%1 = call @sparseDimSize(%arg0, %c1)
...
scf.for %arg2 = %5 to %6 step %c1 {
  scf.for %arg3 = %c0 to %1 step %c1 {
    %8 = muli %1, %arg2 : index
    %9 = addi %8, %arg3 : index 
    %10 = memref.load %2[%9] : memref<?xindex>
    %11 = addi %9, %c1 : index
    %12 = memref.load %2[%11] : memref<?xindex>
    %13 = memref.load %4[] : memref<f32>
    %14 = scf.for %arg4 = %10 to %12 step %c1 iter_args(%arg5 = %13) -> (f32) {
      %15 = memref.load %3[%arg4] : memref<?xf32>
      %16 = addf %arg5, %15 : f32
      scf.yield %16 : f32
    }
    memref.store %14, %4[] : memref<f32>
  }
}

Ah, I see it now. Thanks so much for the clear responses!

1 Like

The next revision will introduce a TensorType interface with the following method declarations:

bool TensorType::isSparse()
 // returns true if tensor is annotated as sparse

DimLevelType TensorType::getDimLevelType(unsigned dim) 
  // returns one of DimLevelType::dense
  //                DimLevelType::compressed
  //                DimLevelType::singleton
  // that denotes the dimension level type in the `dim`th dimension

AffineMap TensorType::getDimOrdering()
   // returns ordering on indices (identity permutation by default)

unsigned TensorType::getPointerBitWidth()/getIndexBitWidth()
   // returns bit width for pointers and indices in
   // overhead storage of sparse tensor

In addition, the revision provides a concrete implementation of this interface on RankedTensorType using the new “encoding” field on ranked tensors that implement this interface using a dictionary attribute as follows:

#CSR = {
  sparseDimLevelType = [ "dense", "compressed" ],
  sparseDimOrdering = affine_map<(i,j) -> (i,j)>,
  sparsePointerBitWidth = 64,
  sparseIndexBitWidth = 64
}

tensor<?x?xf64, #CSR>

Please let us know if that looks good or if you think other properties should be part of this sparse interface (note that we can also add more tensor “semantics” later, including properties not related to sparsity).

We have a TensorType already (and I don’t think these should be members on that TensorType as it privileges sparse compared to other uses of encoding), why not introduce a SparseTensorType interface so that one does the same as others wrt isa/dyn_cast vs adding a isSparse member function?

I’m still trying to wrap my head around the interface uses. I would think that a TensorTypeInterface is so that other types than the builtin tensor can be handled generically by analyses.
We could expose the properties of the builtin tensor through such an interface, and I think that indeed some downstream folks asked for it.
On the other hand, there is the extensibility of the builtin tensor type itself, which is the encoding attribute you added.
Since this is an opaque attribute, we could have multiple encoding interfaces that this attribute could provide, including a “sparse” one. But it looks to me like an “attribute interface” rather than a “type interface”.