[RFC] Introduce a sparse tensor type to core MLIR

aartbik · March 30, 2021, 12:44am

The proposal CL above would allow something like this to define a CSR scheme with 64-bit integers for pointers and indices using a named alias for sparse attribute values.

#CSR = {
  dim_order  = affine_map<(i,j) -> (i,j)>,
  dim_format = [ "D", "S" ],
  pointers = i64,
  indices = i64
}

with sparse tensor type

%a : tensor<?x?xf64, #CSR>

Likewise, a CSC scheme with 8-bit pointers and 16-bit indices would look like

#CSC = {
  dim_order  = affine_map<(i,j) -> (j,i)>,
  dim_format = [ "D", "S" ],
  pointers = i8,
  indices = i16
}

and

%b : tensor<?x?xf64, #CSC>

With the proposal above, trying to assign an %a value to a %b value would automatically be rejected by the type system of MLIR. Assigning such incompatible sparse tensor types to one another would require explicit “cast” operators, which eventually lower to sparse storage scheme conversion code.

aartbik · April 9, 2021, 6:37pm

There is some debate what the field name of the generic attribute of the tensor type should be. Prior suggestions were “layout” and “format”, but others prefer less specific names like “attr”, “metadata”, or “properties”. If you have a strong opinion on this, please let us know now!

And just to be very clear, this is only the field name, the actual value of the attribute will be something like a dictionary attribute where the key/value names and semantics are defined by a type/attribute interface (and can be extended in the future).

javiersetoain · April 9, 2021, 6:55pm

My two cents: I believe something generic is better than something too specific. The same “type attribute” strategy can be use to deal with things like quantized types or maybe even scalability (TBD), and something like “layout” is the wrong description for all those cases. Worst case scenario, how hard would it be to change it in the future?

aartbik · April 9, 2021, 7:02pm

Absolutely. The idea is that, although the first use case will be sparsity annotations, it may very well grow beyond that and contain all sorts of information.

Nothing is impossible, but the more code is added on top of this, the harder it becomes. So a generic name that we can all agree on now would be good.

javiersetoain · April 9, 2021, 7:19pm

Fair enough Probably “properties” is generic enough but not too generic. “Metadata” is also a good one.

_sean_silva · April 9, 2021, 8:23pm

One thing to consider here is that this attribute must be conservatively handled. So “metadata” tends (at least for LLVM folks) to suggest it is something ignorable, which is not what we want.

The reason I (very mildly – just trying to stimulate discussion here) don’t like properties is that it sounds like something that can be ignored if I don’t know about it. (at least attr sounds like an opaque thing I have to be conservative around).

Looking in a thesaurus, disposition and distinction seem possible candidates.

@clattner tends to have a good intuition for these naming things. Chris, what do you think the name of the attribute attached to tensors should be?

aartbik · April 9, 2021, 8:50pm

Based on Chris’ feedback, the current proposal is encoding. Going once, going twice, …

aartbik · April 12, 2021, 10:18pm

… Sold!

Next up: interfaces

jakevdp · April 16, 2021, 7:57pm

I’m following this with interest because I’m currently working on sparse implementations in JAX, which hopefully will someday be able to lower to these MLIR backends. I appreciate the example encodings corresponding to CSR and CSC; in general all 1D and 2D encodings seem pretty intuitive. But I’m having trouble understanding what some 3D encodings might imply in terms of the underlying storage implementation.

For example, what would the storage scheme look like for a DDS encoding?

%a : tensor<?x?x?xf32, #DDS>

DDS = {
  dim_order  = affine_map<(i,j,k) -> (i,j,k)>,
  dim_format = [ "D", "D", "S" ],
  pointers = i32,
  indices = i32
}

It seems like it must be some kind of second-order version of CSR, but I’m unsure. Thanks!

aartbik · April 16, 2021, 8:24pm

Yes, this would request dense storage in the first two indices, and compressed storage in the last, third and innermost dimension. Sometimes it is easier to look at an example.

Consider the 3-D tensor.

# extended FROSTT format
3 7
3 3 4
1 1 1  1.0
1 1 4  2.0
1 2 1  3.0
1 2 2  4.0
3 1 2  5.0
3 2 3  6.0
3 2 4  7.0

Then the annotation [ “D”, “D”, “S” ] would result in the following storage.

positions[2] = [ 0 2 4 4 4 4 4 5 7 7 ]
indices[2]  = [ 0 3 0 1 1 2 3]
values = [1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000 ]

And a very simple kernel like the following “x += A(i,j,k)” would look lowered to sparse code as follows (here, i and j are simple “dense” loops, but k iterates over a compressed format).

%0 = call @sparseDimSize(%arg0, %c0) 
%1 = call @sparseDimSize(%arg0, %c1)
%2 = call @sparsePointers64(%arg0, %c2)
%3 = call @sparseValuesF32(%arg0)
%4 = memref.buffer_cast %arg1 : memref<f32>
scf.for %arg2 = %c0 to %0 step %c1 {
  scf.for %arg3 = %c0 to %1 step %c1 {
    %6 = muli %1, %arg2 : index
    %7 = addi %6, %arg3 : index
    %8 = memref.load %2[%7] : memref<?xindex>
    %9 = addi %7, %c1 : index
    %10 = memref.load %2[%9] : memref<?xindex>
    %11 = memref.load %4[] : memref<f32>
    %12 = scf.for %arg4 = %8 to %10 step %c1
          iter_args(%arg5 = %11) -> (f32) {
      %13 = memref.load %3[%arg4] : memref<?xf32>
      %14 = addf %arg5, %13 : f32
      scf.yield %14 : f32
    }
    memref.store %12, %4[] : memref<f32>
  }
}

Hope that gives some more intuition.

jakevdp · April 16, 2021, 8:32pm

Thank you - that’s helpful! If I’m reading this correctly, it appears that the storage scheme for the (3, 3, 4) tensor in this case is basically that of a standard CSR representation of the tensor flattened into a (9, 4) matrix. Is that a fair summary?

jakevdp · April 16, 2021, 8:37pm

Searching through the forum history, I’m also finding your answer here helpful: MLIR Support for Sparse Tensors - #22 by aartbik

aartbik · April 16, 2021, 8:40pm

Yes.

But of course, the power of a sparse compiler comes from generating all sorts of sparse storage schemes with “the push of a button” so to speak. If you don’t like the flattened-sparse form above, you can try [ “S”, “S”, “S” ], which will give you

positions[0] = [ 0 2 ]
indices[0] = [ 0 2 ]
positions[1] = [ 0 2 4 ]
indices[1] = [ 0 1 0 1 ]
positions[2] = [ 0 2 4 5 7 ]
indices[2] = [ 0 3 0 1 1 2 3 ]
values = [1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000 ]

and the code

%0 = call @sparsePointers64(%arg0, %c0)
%1 = call @sparsePointers64(%arg0, %c1)
%2 = call @sparsePointers64(%arg0, %c2)
%3 = call @sparseValuesF32(%arg0)
%4 = memref.buffer_cast %arg1 : memref<f32>
%5 = memref.load %0[%c0] : memref<?xindex>
%6 = memref.load %0[%c1] : memref<?xindex>
scf.for %arg2 = %5 to %6 step %c1 {
  %8 = memref.load %1[%arg2] : memref<?xindex>
  %9 = addi %arg2, %c1 : index
  %10 = memref.load %1[%9] : memref<?xindex>
  scf.for %arg3 = %8 to %10 step %c1 {
    %11 = memref.load %2[%arg3] : memref<?xindex>
    %12 = addi %arg3, %c1 : index
    %13 = memref.load %2[%12] : memref<?xindex>
    %14 = memref.load %4[] : memref<f32>
    %15 = scf.for %arg4 = %11 to %13 step %c1 iter_args(%arg5 = %14) -> (f32) {
      %16 = memref.load %3[%arg4] : memref<?xf32>
      %17 = addf %arg5, %16 : f32
      scf.yield %17 : f32
    }
    memref.store %15, %4[] : memref<f32>
  }
}

Still not happy? Let’s try ["D", "S", "S"] or some other combination. Once you are happy with the results, you can start fine tuning overhead storage (bit widths of pointer and indices can be controlled individually). Once you are happy with that result, you can fine tune vectorization and parallelization strategies.

So, in a nutshell, sparse tensor code development is no longer tedious, but fun!

aartbik · April 16, 2021, 8:42pm

Indeed, I borrowed heavily from my own previous example in this answer

jakevdp · April 16, 2021, 8:46pm

I think I’m wrapping my head around it… one last question: ["S", "D", "S"], for example, seems like it would require a sequence of positions[1] & indices[1] arrays, becuase the sparsity structure may differ for each indices[0] value. Is my mental model for that correct?

aartbik · April 16, 2021, 9:04pm

Dense dimensions never have a position or indices array associated with them, because the compiler knows the “stride” size of combined, consecutive dimensions. So for the example tensor, this would be stored as follows:

  positions[0] : [ 0 2 ]
  indices[0]  : [ 0 2 ]
  positions[2] : [ 0 2 4 4 5 7 7 ] 
  indices[2] : [ 0 3 0 1 1 2 3 ]
  values: [ 1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000 ]

With the following code.

%1 = call @sparseDimSize(%arg0, %c1)
...
scf.for %arg2 = %5 to %6 step %c1 {
  scf.for %arg3 = %c0 to %1 step %c1 {
    %8 = muli %1, %arg2 : index
    %9 = addi %8, %arg3 : index 
    %10 = memref.load %2[%9] : memref<?xindex>
    %11 = addi %9, %c1 : index
    %12 = memref.load %2[%11] : memref<?xindex>
    %13 = memref.load %4[] : memref<f32>
    %14 = scf.for %arg4 = %10 to %12 step %c1 iter_args(%arg5 = %13) -> (f32) {
      %15 = memref.load %3[%arg4] : memref<?xf32>
      %16 = addf %arg5, %15 : f32
      scf.yield %16 : f32
    }
    memref.store %14, %4[] : memref<f32>
  }
}

jakevdp · April 16, 2021, 9:05pm

Ah, I see it now. Thanks so much for the clear responses!

aartbik · April 19, 2021, 6:06pm

The next revision will introduce a TensorType interface with the following method declarations:

bool TensorType::isSparse()
 // returns true if tensor is annotated as sparse

DimLevelType TensorType::getDimLevelType(unsigned dim) 
  // returns one of DimLevelType::dense
  //                DimLevelType::compressed
  //                DimLevelType::singleton
  // that denotes the dimension level type in the `dim`th dimension

AffineMap TensorType::getDimOrdering()
   // returns ordering on indices (identity permutation by default)

unsigned TensorType::getPointerBitWidth()/getIndexBitWidth()
   // returns bit width for pointers and indices in
   // overhead storage of sparse tensor

In addition, the revision provides a concrete implementation of this interface on RankedTensorType using the new “encoding” field on ranked tensors that implement this interface using a dictionary attribute as follows:

#CSR = {
  sparseDimLevelType = [ "dense", "compressed" ],
  sparseDimOrdering = affine_map<(i,j) -> (i,j)>,
  sparsePointerBitWidth = 64,
  sparseIndexBitWidth = 64
}

tensor<?x?xf64, #CSR>

Please let us know if that looks good or if you think other properties should be part of this sparse interface (note that we can also add more tensor “semantics” later, including properties not related to sparsity).

jpienaar · April 19, 2021, 6:32pm

We have a TensorType already (and I don’t think these should be members on that TensorType as it privileges sparse compared to other uses of encoding), why not introduce a SparseTensorType interface so that one does the same as others wrt isa/dyn_cast vs adding a isSparse member function?

mehdi_amini · April 19, 2021, 8:29pm

I’m still trying to wrap my head around the interface uses. I would think that a TensorTypeInterface is so that other types than the builtin tensor can be handled generically by analyses.
We could expose the properties of the builtin tensor through such an interface, and I think that indeed some downstream folks asked for it.
On the other hand, there is the extensibility of the builtin tensor type itself, which is the encoding attribute you added.
Since this is an opaque attribute, we could have multiple encoding interfaces that this attribute could provide, including a “sparse” one. But it looks to me like an “attribute interface” rather than a “type interface”.

Topic		Replies	Views
Sparse Tensors in MLIR MLIR	60	5579	January 19, 2022
MLIR Support for Sparse Tensors MLIR	21	5708	March 26, 2021
Sparse Representation MLIR	46	4119	December 3, 2020
MLIR News, 32nd edition (4/17 - 4/30/2021) Newsletter	0	990	April 19, 2021
[RFC] Sparse tensor support in torch-mlir torch-mlir	35	2429	February 9, 2024

[RFC] Introduce a sparse tensor type to core MLIR

Related Topics