Sparse Compiler and GPU Code-Generation

aartbik · May 6, 2023, 12:45am

Some more progress!

The sparse compiler now has two prototype strategies for generating CUDA:

CUDA codegen: this converts sparsified code to CUDA threads
CUDA libgen: this converts pre-sparsified code to cuSPARSE library calls

An example of the former was shown above. An example of the latter is illustrated below (note that I have extended the GPU dialect with cuSparse support, I will send that out for review shortly, since this may trigger some discussions on the proper way to represent this and whether async tokens are required; but the basic mechanism is ready to be deployed!).

 func.func @matvec(%A: tensor<?x?xf64, #SortedCOO>,
                    %x: tensor<?xf64>,
                    %y_in: tensor<?xf64>) -> tensor<?xf64> {
    %y_out = linalg.matvec
      ins(%A, %x: tensor<?x?xf64, #SortedCOO>, tensor<?xf64>)
      outs(%y_in: tensor<?xf64>) -> tensor<?xf64>
    return %y_out : tensor<?xf64>
  }

lowers directly into cuSPARSE:

    %16 = gpu.create_sparse_env
    %17 = gpu.create_coo %1, %2, %dim, %memref, %memref_2, %memref_5 : memref<?xindex>, memref<?xindex>, memref<?xf64>
    %18 = gpu.create_dn_vec %memref_8, %2 : memref<?xf64>
    %19 = gpu.create_dn_vec %memref_11, %1 : memref<?xf64>
    %20 = gpu.spmv_buffer_size %16, %17, %18, %19
    %21 = gpu.wait async
    %memref_13, %asyncToken_14 = gpu.alloc async [%21] (%20) : memref<?xi8>
    gpu.wait [%asyncToken_14]
    gpu.spmv %16, %17, %18, %19, %memref_13 : memref<?xi8>
    gpu.destroy_sp_mat %17
    gpu.destroy_dn_vec %18
    gpu.destroy_dn_vec %19
    gpu.destroy_sparse_env %16

Topic		Replies	Views
MLIR Support for Sparse Tensors MLIR	21	6193	March 26, 2021
MLIR Sparse Compiler Progress MLIR	24	5043	February 9, 2026
Sparse Representation MLIR	46	4910	December 3, 2020
GPU Dialect SPMV MLIR	5	449	August 10, 2023
Sparse Tensors in MLIR MLIR	62	6608	March 25, 2025

Sparse Compiler and GPU Code-Generation

Related topics