I’ve got it in my mind that I want to create a pass which uses some heuristic to promote some parts of a tensor program to sparse and then experiment with the codegen (since for deployment, we often have constant weights and can make some easy decisions based on that). I’m kind of treating this as an intro project to the sparse side of the world.
Since all of this infra is brand new and never has been attempted to connect together, I thought I’d play it forward step by step. Maybe the journey will be useful as a sample/documentation in the end.
To get started, I wrote the following little sample program:
from mlir.ir import *
from mlir.dialects.builtin import *
from mlir.dialects.tosa import *
from mlir.passmanager import *
import mlir.dialects.sparse_tensor as st
import mlir.conversions
def sparse_tensor(shape, levels=None, ordering=None, dtype=None):
rank = len(shape)
if not levels:
levels = [st.DimLevelType.compressed] * rank
if not ordering:
ordering = AffineMap.get_identity(rank)
encoding = st.EncodingAttr.get(levels, ordering, 32, 32)
return RankedTensorType.get(shape,
dtype if dtype else F32Type.get(), encoding=encoding)
def dense_tensor(shape, dtype=None):
return RankedTensorType.get(shape,
dtype if dtype else F32Type.get())
def create_sample_fc_module():
m = Module.create()
with InsertionPoint(m.body):
@FuncOp.from_py_func(
dense_tensor([256, 1024]),
sparse_tensor([64, 1024]),
dense_tensor([64]))
def fc(inputs, weights, bias):
d0 = RankedTensorType(inputs.type).get_dim_size(0)
d1 = RankedTensorType(weights.type).get_dim_size(0)
result_type = dense_tensor([d0, d1])
return FullyConnectedOp(
result_type,
input=inputs, weight=weights, bias=bias,
quantization_info=None).result
return m
with Context() as ctx, Location.unknown():
m = create_sample_fc_module()
print("// Input module")
print(m)
pm = PassManager.parse("func(tosa-to-linalg-on-tensors)")
pm.run(m)
print("\n\n// Post linalg conversion")
print(m)
Which dutifully prints:
// Input module
module {
func @fc(%arg0: tensor<256x1024xf32>, %arg1: tensor<64x1024xf32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d0, d1)>, pointerBitWidth = 32, indexBitWidth = 32 }>>, %arg2: tensor<64xf32>) -> tensor<256x64xf32> {
%0 = "tosa.fully_connected"(%arg0, %arg1, %arg2) : (tensor<256x1024xf32>, tensor<64x1024xf32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d0, d1)>, pointerBitWidth = 32, indexBitWidth = 32 }>>, tensor<64xf32>) -> tensor<256x64xf32>
return %0 : tensor<256x64xf32>
}
}
// Post linalg conversion
#map0 = affine_map<(d0, d1) -> (d1)>
#map1 = affine_map<(d0, d1) -> (d0, d1)>
#map2 = affine_map<(d0, d1) -> (d1, d0)>
module {
func @fc(%arg0: tensor<256x1024xf32>, %arg1: tensor<64x1024xf32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d0, d1)>, pointerBitWidth = 32, indexBitWidth = 32 }>>, %arg2: tensor<64xf32>) -> tensor<256x64xf32> {
%0 = linalg.init_tensor [256, 64] : tensor<256x64xf32>
%1 = linalg.generic {indexing_maps = [#map0, #map1], iterator_types = ["parallel", "parallel"]} ins(%arg2 : tensor<64xf32>) outs(%0 : tensor<256x64xf32>) {
^bb0(%arg3: f32, %arg4: f32): // no predecessors
linalg.yield %arg3 : f32
} -> tensor<256x64xf32>
%cst = constant dense<[1, 0]> : tensor<2xi64>
%2 = linalg.init_tensor [1024, 64] : tensor<1024x64xf32>
%3 = linalg.generic {indexing_maps = [#map2, #map1], iterator_types = ["parallel", "parallel"]} ins(%arg1 : tensor<64x1024xf32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d0, d1)>, pointerBitWidth = 32, indexBitWidth = 32 }>>) outs(%2 : tensor<1024x64xf32>) {
^bb0(%arg3: f32, %arg4: f32): // no predecessors
linalg.yield %arg3 : f32
} -> tensor<1024x64xf32>
%4 = linalg.matmul ins(%arg0, %3 : tensor<256x1024xf32>, tensor<1024x64xf32>) outs(%1 : tensor<256x64xf32>) -> tensor<256x64xf32>
return %4 : tensor<256x64xf32>
}
}
First off, this has a couple of problems:
- The lowering from
tosa.fully_connected
looks wrong to my eyes (at a minimum, I would have expected to see anaddf
somewhere for the bias vector). - The conversions do not do any propagation of the tensor encoding, which may or may not be what we want (but is almost certainly not thought through for this case).
And some style nits:
- It would be really nice if the
tensor
encoding were pulled up as an attribute alias like the affine maps are. It is quite hard to read as-is. - I can see half a dozen things that should/could be better in the Python AP.
Would love it if folks who have worked on this could help me choose my own adventure here and discuss/highlight next steps. Any of you all interested in collaborating towards a worked example here?@aartbik @rsuderman @sjarus