Toy to Linalg dialect

I am trying to convert high level toy::AddOp to add op in Linalg. I have found two representations in Linalg.
One is [RFC] TOSA-to-Linalg lowering of element-wise ops

#map = affine_map<(d0, d1) -> (d0, d1)>

module {
  func.func @main(%arg0: tensor<3x5xf32>, %arg1: tensor<3x5xf32>) -> tensor<?x?xf32> {
    %0 = tensor.empty() : tensor<3x5xf32>
    %1 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"]} ins(%arg0, %arg1 : tensor<3x5xf32>, tensor<3x5xf32>) outs(%0 : tensor<3x5xf32>) {
    ^bb0(%in: f32, %in_0: f32, %out: f32):
      %2 = arith.addf %in, %in_0 : f32
      linalg.yield %2 : f32
    } -> tensor<3x5xf32>
    %cast = tensor.cast %1 : tensor<3x5xf32> to tensor<?x?xf32>
    return %cast : tensor<?x?xf32>
  }
}

another is 'linalg' Dialect - MLIR

linalg::AddOp

I could not find any discussion about the pros & cons of either approach. My question is, to lower the high level toy language to llvm IR via Linalg , which approach will be more appropriate?

Linalg named ops is a really new thing and still under development, while TOSA has a more established semantics. The main difference between using TOSA and Linalg named ops is that Linalg is evolving into a compiler IR while TOSA is kept stable with an official specification.

If you use TOSA, you’ll have to make sure your toy language semantics is identical to their spec, while if you use Linalg, you can submit patches upstream to change update the semantics without going through a committee and spec changes.

The main practical reason to use Linalg named ops now is that the lowering to generic is already implemented, so toy.add → linalg.add → linalg.generic { addf } is upsteam and working.

Semantically speaking, however, TOSA and Linalg have very different design points. TOSA uses broadcast semantics (on axis 1), while Linalg has no implicit broadcast semantics at all. You have to use an explicit linalg.broadcast on the operand in question. This is to avoid ambiguity.

In practice, a sequence of linalg.broadcast + linalg.add can be lowered to a single linalg.generic with the correct affine map to read from from the operand like (d0, d1) -> (d0) or (d0, d1) -> (d1) depending if this is a row or column broadcast. That support is not upstream yet, but we’re working on it.

In the end, if you use Linalg, you’ll get a simple but always evolving lowering, while if you use TOSA, you’ll get a more complete semantics, but WYSIWYG. Also, I’m not sure the the lowering to Linalg generics RFC has been merged yet. If not, you’d also have to wait for that to go upstream.

As usual, if you have ideas on how to improve the linalg named ops, please let us know. Ask questions, create RFCs, submit patches. Right now, it’s in a very raw state, so right now is the right time to bring wild ideas. :slight_smile:

Thank you …

Quick note here, this should work out of the box here, conceptually as follows:

fuseElementwiseOps(generalize(linalg.broadcast), 
                   generalize(linalg.add)) ---> single linalg.generic

So I would try to turn this into just a bunch of DAG pattern matchers and get full graph-level, perfectly-nested loop fusion for free.

I’m sure you’re evaluating the tradeoff already but I’ll still write it: “please don’t be too creative on the impl. side for now until we identify concrete gaps that need evolutions / new solutions” :slight_smile:

1 Like

I think I missed that, could you point me to it?

The generalize transform is available but I am not clear which lowering you are referring to (e.g. lowering of composites via @qcolombet recently added interface for softmax, or lowering for ops that need Tablegen, or something else ?).

Now that I read my answer again, it was ambiguous. I meant TOSA lowering to Linalg.

This is the TOSA lowering RFC that the OP mentioned in the question.

1 Like