Sparse Tensor I/O

Some more 2022 news, as was announced at the end of the Sparse Tensors in MLIR thread. After revision D117850, MLIR’s sparse compiler reference implementation now supports full sparse tensor I/O for proper testing and comparison (on a target with a file system, this means file I/O, but the ops are kept a bit more general to support different kinds of I/O in the future).

Consider, for example, the following PyTACO program.

csr = pt.format([dense, compressed], [0,1])
...
A = pt.read('A.mtx', csr)
B = pt.read('B.mtx', csr)
C = pt.tensor((A.shape[0], B.shape[1]), csr)
i, j, k = pt.get_index_vars(3)
C[i,j] = A[i,k] * B[k,j]
pt.write("C.tns", C)

Now, this can be represented in MLIR as follows.

%A = sparse_tensor.new %srcA : !Filename to tensor<?x?xf64, #CSR>
%B = sparse_tensor.new %srcB : !Filename to tensor<?x?xf64, #CSR>
%C = linalg.matmul   ins(%A, %B: tensor<?x?xf64, #CSR>, 
                                 tensor<?x?xf64, #CSR>)
                              -> tensor<?x?xf64, #CSR>
sparse_tensor.out %C, %destC : tensor<?x?xf64, #CSR>, !Filename

Now we can compare the result computed by the TACO compiler with the result computed by the MLIR sparse compiler.

For example, given input “sparse” matrices

    [ 1 2 3 ]        [ 10, 11, 12 ]
A = [ 4 5 6 ]    B = [ 13, 14, 15 ]
    [ 7 8 9 ]        [ 16, 17, 18 ]

Then both PyTACO and MLIR generate the following output file “C.tns” (except that MLIR uses the extended FROSTT format).

; extended FROSTT format
2 9
3 3
1 1 100
1 2 107
1 3 114
2 1 201
2 2 216
2 3 231
3 1 318
3 2 342
3 3 366

Happy testing!

2 Likes

And, in case you are wondering, tensor repositories such as the MatrixMarket (extension “.mtx”: a repository of sparse matrices from a variety of applications, as well as matrix generation tools) and FROSTT (extension “.tns”: a collection of publicly available sparse tensor datasets and tools) use a well-defined external file format. A while back, I proposed extending the FROSTT file format with some metadata in the header that would simplify reading in tensors, as is done in the first few lines of the output tensor above.