TOSA to LinAlg lowerings and legalizations

I am looking to contribute to the TOSA to LinAlg lowering path and wanted to get some clarifications before just beginning work. What is the status on the ARM side? Is there are good launching off point written at this time? LinAlg has made some significant progress in the last few months so lining up with their changes may take some work but could save significant time long-term.

Side note, I have worked on similar conversions previously and have some thoughts on how we could simplify legalization to other dialects (e.g. LinAlg).

  • Supporting decomposition of attributes when possible allows simpler compilation / mapping to higher level operations. Cases like convolution (stride, input dilation, kernel dilation, padding) can be decomposed into a normalized operation with shape manipulations (transpose, reshape, pad). For code gen cases this can drastically simplify the lowering process.
  • Control flow lowerings with inlining as soon in the compilation process as possible. Cross control flow lowerings often encounter early issues (e.g. type propagation, canonicalizations).
  • Support for a TOSA specific shape propagation / validation - even if it cannot completely infer correct shape, it can substantially improve iteration on complex lowerings. This could even be supported at the canonicalization level.
  • Supporting a ‘broadcast_to’ to avoid per op broadcasting. Implicit broadcasting requires validating operations at lowering and often pushes errors to much further in the stack. Having an explicit broadcast_to simplifies validation as an intermediate step when lowering from implicit to explicit broadcasting dialect.

Also, to my understanding there is an exhaustive test suite for TOSA. It would be quite beneficial to hook it up to the mlir-cpu-runner for correctness testing. Any update on its status at this time?

@sjarus - are you a good point of contact for this?

Rob

Thanks for starting this thread, @rsuderman ! As we described in the RFC, we did a simple POC of the lowering to LinAlg. @stellaraccident suggested we push that as a followon patch to the TOSA dialect which has now landed in the LLVM tree. However, as you mentioned, LinAlg has evolved quite a bit since then, so this is no longer the same work. Back then, LinAlg on tensors wasn’t something we knew much of. I’ll take another look at this pass and the latest on LinAlg.

There’s been quite a few advances within underlying MLIR related to LinAlg codegen, control flow lowering, as well as dynamic shapes handling, and we’re interested in discussing how the dialect implementation could address these well.

We’re happy to discuss dialect level mechanics that don’t impact spec. For spec impacting work, @TomCookseyArm offers some additional information in [RFC] TOSA Dialect in MLIR - #39 by TomCookseyArm .

The full unit test infrastructure actually targets the TOSA reference model. This infrastructure is described on mlplatform: Test Infrastructure for TOSA - TOSA - Discourse . The reference model is a C++ functional implementation of the TOSA op set and runs full TOSA networks serialized in flatbuffers form. The reference model has also been open sourced:
⚡ TOSA
reference_model.git - [no description]
Doesn’t the mlir-cpu-runner require accurate legalizations from TOSA to LinAlg to be in place ?

As you say, since a lot has changed, what you have in this area may be the most useful as a branch in a fork somewhere that can be referenced/incorporated? Just being able to see the priors would likely yield the collaboration point to get things moving.

Fwiw - I don’t see any of the points above as spec impacting: they all relate to how one would implement the spec in MLIR, which includes a default lowering path.

I’ll take a look at that work and see if it can be quickly wrapped up in this manner.

The broadcast_to appears to refer to needing a new op in TOSA, but perhaps we’re simply misreading that due to recent context within similar discussions involving TOSA ?

Not really: if lowering to LinAlg, you need to legalize out degenerate broadcasts. If TOSA doesn’t have the requirement (i.e. all of its ops are defined with implicit broadcast), it doesn’t need the op, but we do need something in MLIR that does it (i.e. on the way to LinAlg). Even then, you can technically get away without it, since it just maps to a generic op with specific indexing maps, but it is convenient. It might even just be a helper function for LinAlg lowerings.

Overall it’s useful from a code validation use case as well. Compilation paths where implicit broadcasting is supported can fairly easily be recombined into an implicit version. It also simplifies that validation of input/output types for the dialect. It certainly is not a hard requirement but does better constraint op behavior.

I am not 100% certain at mlir-cpu-runner’s specifics but it may be possible to lower TOSA directly to an LLVM executable version that uses your sample kernels. This would validate the kernels are correct and allow any integrating project to validate their own infrastructure. @stellaraccident is more knowledgeable to reply.

Thanks. Broadcasting seems like a general enough concern, and there’s potentially multiple broadcasting rules that could be applied (e.g. Numpy vs XLA from recent conversation - also described in the TosaMakeBroadcastable pass in the TOSA dialect) that it seems this is something MLIR might want to implicitly extract from the dialect during the process of further code generation.

Yes, I think this is my main message: take the time to create the right ops in-repo for it. I’ve contributed to some of the debt over the years by just creating ad-hoc ops to expand broadcasts in downstream projects (I think we finally excised one from IREE not too long ago), and part of the value of having TOSA in-tree is that it is a forcing function to create some of these things that never quite had the critical mass to do right in an out of tree project.

Agreed. We’re in favor of not making ad hoc choices where a better option is preferable. As in the case of how TOSA carries numerical formulation as a fully defined solution for quantized types in-op, we’d like to make right choices with interfacing concerns like broadcasting, dynamic shapes and other topics like the ones listed in this thread, and would be happy to discuss them further.

The reference model is designed to validate a subgraph of TOSA operators (e.g., legal operands/attributes/outputs/datatypes), then read input tensor data, evaluate the network, and produce output tensors. The model doesn’t expose kernels directly, but if mlir-cpu-runner can express its output in terms of TOSA operators and input tensor data, it could run through the reference model today.

We can certainly discuss these needs further.

I would love to contribute to this.

From my experience in npcomp, if we open the door to the “size 1 broadcasting” degenerate case for dynamic shapes, we’ll be in for a very hard time because

  1. it cannot be lowered to linalg in general (it creates internal aliasing which breaks transformations).
  2. the hacky way of handling this that is floating around (the “stride 0 trick”) isn’t even expressible on linalg-on-tensors which is what we should be targeting with these lowerings

Maybe not urgently, but I imagine if we had a public meeting about shape legalization with TOSA as a case study, we could fill an hour, and probably flush a lot of tribal knowledge out in the process.

5 Likes

We now have both the dialect in the MLIR repo and lowerings from TFLite/TF in the tensorflow repo. There are still a number of cleanups and discussions to be had but this is approaching the point to talk about next steps. We should plan to sync on next steps for lowerings some time next week (after the US holidays).

2 Likes