Linalg.generic with scalar output

linalg.generic doesn’t appear to allow scalar output. Attempting to do a simple vector dot product gives the error 'linalg.generic' op expected the number of results (1) to be equal to the number of output tensors (0).

I found the code comment in LinalgInterfaces.cpp which states:

  // Expect at least one output operand.
  // This means an op that constructs a tensor out of indices cannot be a
  // LinalgOp at the moment. For now this will have to be a special op until we
  // have output shape operands that are not tensors.

I know linalg.dot exists, but that doesn’t cover my real use case, which is using non-standard semirings for matrix multiplication (ex. add the intersecting values, then reduce by taking the minimum = min_plus semiring). Ultimately, I’m working towards implementing GraphBLAS using the spare_tensor dialect’s lowering of linalg.generic.

Are there technical challenges that make it difficult for linalg.generic to do a full reduction or does it simply require someone willing to add the necessary code and tests? linalg.generic is truly amazing, especially with the sparse support added by @aartbik, so I’m hoping this current restriction isn’t permanent.

1 Like

Jim,
Perhaps not ideal, but you can express scalar output using a 0-dim tensor. For example, a sparse vector reduction would look as follows.

#SparseVector = #sparse_tensor.encoding<{ dimLevelType = [ "compressed" ] }>

#trait_sum_reduction = {
  indexing_maps = [
    affine_map<(i) -> (i)>,  // a
    affine_map<(i) -> ()>    // x (scalar out)
  ],
  iterator_types = ["reduction"],
  doc = "x += SUM_i a(i)"
}

// sum reduction
  %0 = linalg.generic #trait_sum_reduction
    ins(%arga: tensor<?xf32, #SparseVector>)
    outs(%argx: tensor<f32>) {
      ^bb(%a: f32, %x: f32):
        %0 = arith.addf %x, %a : f32
        linalg.yield %0 : f32
  } -> tensor<f32>

The sparse compiler codegen actually even “scalarizes” the 0-dim tensor inside the resulting computation:

%5 = memref.load %0[%c0] : memref<?xindex>
%6 = memref.load %0[%c1] : memref<?xindex>
%7 = scf.for %arg2 = %5 to %6 step %c1 iter_args(%arg3 = %4) -> (f32) {
  %9 = memref.load %1[%arg2] : memref<?xf32>
  %10 = arith.addf %arg3, %9 : f32
  scf.yield %10 : f32
}
2 Likes

@aartbik
This is perfect. I forgot that zero-degree tensors exist. I kept trying with outs(%argx: f32).

Having the sparse_tensor dialect “scalarize” the result is exactly the final output I want. The sparse_tensor lowering of linalg.generic continues to amaze me! I know it has been a huge effort getting to this point. Thank you for all your hard work.

2 Likes

And for the sake of completeness, here are the missing pieces to make the solution code work.

%argx = linalg.init_tensor [] : tensor<f32>
%0 = ... (see solution code)
%1 = tensor.extract %0[] : tensor<f32>
return %1 : f32
1 Like

Thanks for making my day, Jim!