Sparse Compiler and GPU Code-Generation

Some more progress!

The sparse compiler now has two prototype strategies for generating CUDA:

  1. CUDA codegen: this converts sparsified code to CUDA threads
  2. CUDA libgen: this converts pre-sparsified code to cuSPARSE library calls

An example of the former was shown above. An example of the latter is illustrated below (note that I have extended the GPU dialect with cuSparse support, I will send that out for review shortly, since this may trigger some discussions on the proper way to represent this and whether async tokens are required; but the basic mechanism is ready to be deployed!).

 func.func @matvec(%A: tensor<?x?xf64, #SortedCOO>,
                    %x: tensor<?xf64>,
                    %y_in: tensor<?xf64>) -> tensor<?xf64> {
    %y_out = linalg.matvec
      ins(%A, %x: tensor<?x?xf64, #SortedCOO>, tensor<?xf64>)
      outs(%y_in: tensor<?xf64>) -> tensor<?xf64>
    return %y_out : tensor<?xf64>
  }

lowers directly into cuSPARSE:

    %16 = gpu.create_sparse_env
    %17 = gpu.create_coo %1, %2, %dim, %memref, %memref_2, %memref_5 : memref<?xindex>, memref<?xindex>, memref<?xf64>
    %18 = gpu.create_dn_vec %memref_8, %2 : memref<?xf64>
    %19 = gpu.create_dn_vec %memref_11, %1 : memref<?xf64>
    %20 = gpu.spmv_buffer_size %16, %17, %18, %19
    %21 = gpu.wait async
    %memref_13, %asyncToken_14 = gpu.alloc async [%21] (%20) : memref<?xi8>
    gpu.wait [%asyncToken_14]
    gpu.spmv %16, %17, %18, %19, %memref_13 : memref<?xi8>
    gpu.destroy_sp_mat %17
    gpu.destroy_dn_vec %18
    gpu.destroy_dn_vec %19
    gpu.destroy_sparse_env %16
3 Likes