Sparse Matrix Market Exchange Format support in MLIR

With the sparse work ramping up, I am making some support I have been implementing more broadly available.

Because MLIR is not a general-purpose programming language, it can be tedious to set up test inputs in benchmarks and integration tests. For sparse matrices in particular, reading an external file is much more preferable than programmatically encoding all nonzeros of test matrices.

To simplify benchmarking and testing sparse computations, I have written a very small runtime support library to read sparse matrices in the Matrix Market Exchange Format, since this has become a popular format to share and distribute sparse matrices. With this runtime support library, filling a sparse storage scheme is as simple as running the following MLIR code. The part on the dots consist of inserting the nonzeros of the test matrix into the sparse storage scheme used by the benchmark or integration test.

call @openMatrix(“matrix.mtx”, %m, %n, %nnz)
     : (!llvm.ptr<i8>, memref<index>,
                       memref<index>, memref<index>) -> ()
// .... get ready for a m x n matrix A with nnz nonzeros ....
%u = load %nnz[] : memref<index>
scf.for %k = %c0 to %u step %c1 {
  call @readMatrixItem(%i, %j, %d)
      : (memref<index>, memref<index>, memref<f64>) -> ()
  // .... process nonzero A[%i,%j] = %d ....
call @closeMatrix() : () -> ()

I will send out the CL with this library and a sample integration shortly. Note that this library is not part of core MLIR. It is merely a convenience library to simplify sparse matrix setup in tests and benchmarking.

A first version of a lightweight convenience library has been submitted. Please feel free to make suggestions and contributions. Keep in mind, however, that this is just a convenience wrapper to make testing, debugging, and benchmarking sparse code in MLIR easier, it should not evolve into an elaborate sparse runtime package (that should probably be done elsewhere).

A sample program for illustration purposes has been submitted in the integration test and I briefly walk through parts of the code below.

To read the header of a matrix in the (sparse) Matrix Market Exchange Format, simply call the following code to get information of the size (m x n) and number of nonzeros (nnz) in the sparse matrix. The integer arguments are passed as memrefs.

%file = call @getMatrix(%c0) : (index) -> (!llvm.ptr<i8>)
call @openMatrix(%file, %m, %n, %nnz)
    : (!llvm.ptr<i8>, memref<index>,
	                  memref<index>, memref<index>) -> ()
%M = load %m[]   : memref<index>
%N = load %n[]   : memref<index>
%Z = load %nnz[] : memref<index>

Note that string manipulations are not easy in MLIR. Therefore, a convenience method getMatrix() is provided to read a filename from the environment (MATRIX<id>). This allows the test to pass in the full path to the test matrix test.mtx by letting the test framework do its substitution magic.

// RUN: MATRIX0="%mlir_integration_test_dir/data/test.mtx" \
// RUN: mlir-cpu-runner ....

The header passes control back to the MLIR program, rather than reading in the full matrix at once, so that the client code can allocate and prepare a proper sparse storage scheme for an m x n matrix with nnz nonzero elements (this separation also avoids complications if we would mix C allocation in the support library with MLIR allocation in the client code).

For simplicity of presentation, the sample program simply sets up a dense matrix (however, keep in mind that typical client code will set up a much more elaborate data structure here).

%a = alloc(%M, %N) : memref<?x?xf64>
scf.for %ii = %c0 to %M step %c1 {
  scf.for %jj = %c0 to %N step %c1 {
    store %d0, %a[%ii, %jj] : memref<?x?xf64>

At this point, we are ready to read the nonzeros of the sparse matrix one by one. Typical clients would insert these nonzeros into the sparse storage scheme somehow. Our sample code simply inserts them into the dense array.

scf.for %k = %c0 to %Z step %c1 {
  call @readMatrixItem(%i, %j, %d)
      : (memref<index>, memref<index>, memref<f64>) -> ()
  %I = load %i[] : memref<index>
  %J = load %j[] : memref<index>
  %D = load %d[] : memref<f64>
  store %D, %a[%I, %J] : memref<?x?xf64>  // insert in dense (not typical)

Then, make sure to close the file (the convenience library currently is not thread-safe or re-entrant, so this is good practice).

call @closeMatrix() : () -> ()

I hope this support is useful. Please give us your feedback if you have ideas on how to improve.

1 Like

I generalized the support in the sparse runtime support library to tensors in FROSTT format, and added the ability to sort indices lexicographically before passing the values back to the client, since many sparse storage formats are easier to set up that way (see ⚙ D94852 [mlir][sparse] improved sparse runtime support library).

I found the FROSTT format (see FROSTT) a bit too concise, since it only includes the nonzero elements with indices (so that the rank and number of nonzeros must be inferred from the file; the exact dimension sizes remain unknown if not all dimensions are filled). An “extended format” with a bit more metadata, as follows, seems easier.

# extended FROSTT file format
<rank> <nnz>
<dimension sizes (one per rank)>

So, a 2x3 dense matrix would then look as follows

# this is a comment
2 6
2 3
1 1 1.1
1 2 1.2
1 3 1.3
2 1 2.1
2 2 2.2
2 3 2.3

The changes to the API of the sparse runtime support library are minor, basically renaming Matrix into Tensor, and passing arrays for metadata and indices instead. Sample codes can be found in the change above.

1 Like

The FROSTT team is very kindly considering this proposal. You can track the progress in this GIT issue.

1 Like