TOSA to LinAlg lowerings and legalizations

rsuderman · November 17, 2020, 9:08pm

I am looking to contribute to the TOSA to LinAlg lowering path and wanted to get some clarifications before just beginning work. What is the status on the ARM side? Is there are good launching off point written at this time? LinAlg has made some significant progress in the last few months so lining up with their changes may take some work but could save significant time long-term.

Side note, I have worked on similar conversions previously and have some thoughts on how we could simplify legalization to other dialects (e.g. LinAlg).

Supporting decomposition of attributes when possible allows simpler compilation / mapping to higher level operations. Cases like convolution (stride, input dilation, kernel dilation, padding) can be decomposed into a normalized operation with shape manipulations (transpose, reshape, pad). For code gen cases this can drastically simplify the lowering process.
Control flow lowerings with inlining as soon in the compilation process as possible. Cross control flow lowerings often encounter early issues (e.g. type propagation, canonicalizations).
Support for a TOSA specific shape propagation / validation - even if it cannot completely infer correct shape, it can substantially improve iteration on complex lowerings. This could even be supported at the canonicalization level.
Supporting a ‘broadcast_to’ to avoid per op broadcasting. Implicit broadcasting requires validating operations at lowering and often pushes errors to much further in the stack. Having an explicit broadcast_to simplifies validation as an intermediate step when lowering from implicit to explicit broadcasting dialect.

Also, to my understanding there is an exhaustive test suite for TOSA. It would be quite beneficial to hook it up to the mlir-cpu-runner for correctness testing. Any update on its status at this time?

@sjarus - are you a good point of contact for this?

Rob

sjarus · November 17, 2020, 10:22pm

Thanks for starting this thread, @rsuderman ! As we described in the RFC, we did a simple POC of the lowering to LinAlg. @stellaraccident suggested we push that as a followon patch to the TOSA dialect which has now landed in the LLVM tree. However, as you mentioned, LinAlg has evolved quite a bit since then, so this is no longer the same work. Back then, LinAlg on tensors wasn’t something we knew much of. I’ll take another look at this pass and the latest on LinAlg.

There’s been quite a few advances within underlying MLIR related to LinAlg codegen, control flow lowering, as well as dynamic shapes handling, and we’re interested in discussing how the dialect implementation could address these well.

We’re happy to discuss dialect level mechanics that don’t impact spec. For spec impacting work, @TomCookseyArm offers some additional information in [RFC] TOSA Dialect in MLIR - #39 by TomCookseyArm .

The full unit test infrastructure actually targets the TOSA reference model. This infrastructure is described on mlplatform: Test Infrastructure for TOSA - TOSA - Discourse . The reference model is a C++ functional implementation of the TOSA op set and runs full TOSA networks serialized in flatbuffers form. The reference model has also been open sourced:
⚡ TOSA
reference_model.git - [no description]
Doesn’t the mlir-cpu-runner require accurate legalizations from TOSA to LinAlg to be in place ?

stellaraccident · November 17, 2020, 10:27pm

As you say, since a lot has changed, what you have in this area may be the most useful as a branch in a fork somewhere that can be referenced/incorporated? Just being able to see the priors would likely yield the collaboration point to get things moving.

Fwiw - I don’t see any of the points above as spec impacting: they all relate to how one would implement the spec in MLIR, which includes a default lowering path.

sjarus · November 17, 2020, 10:36pm

I’ll take a look at that work and see if it can be quickly wrapped up in this manner.

The broadcast_to appears to refer to needing a new op in TOSA, but perhaps we’re simply misreading that due to recent context within similar discussions involving TOSA ?

stellaraccident · November 17, 2020, 10:45pm

Not really: if lowering to LinAlg, you need to legalize out degenerate broadcasts. If TOSA doesn’t have the requirement (i.e. all of its ops are defined with implicit broadcast), it doesn’t need the op, but we do need something in MLIR that does it (i.e. on the way to LinAlg). Even then, you can technically get away without it, since it just maps to a generic op with specific indexing maps, but it is convenient. It might even just be a helper function for LinAlg lowerings.

rsuderman · November 17, 2020, 10:58pm

Overall it’s useful from a code validation use case as well. Compilation paths where implicit broadcasting is supported can fairly easily be recombined into an implicit version. It also simplifies that validation of input/output types for the dialect. It certainly is not a hard requirement but does better constraint op behavior.

rsuderman · November 17, 2020, 11:01pm

I am not 100% certain at mlir-cpu-runner’s specifics but it may be possible to lower TOSA directly to an LLVM executable version that uses your sample kernels. This would validate the kernels are correct and allow any integrating project to validate their own infrastructure. @stellaraccident is more knowledgeable to reply.

sjarus · November 17, 2020, 11:05pm

Thanks. Broadcasting seems like a general enough concern, and there’s potentially multiple broadcasting rules that could be applied (e.g. Numpy vs XLA from recent conversation - also described in the TosaMakeBroadcastable pass in the TOSA dialect) that it seems this is something MLIR might want to implicitly extract from the dialect during the process of further code generation.

stellaraccident · November 17, 2020, 11:08pm

Yes, I think this is my main message: take the time to create the right ops in-repo for it. I’ve contributed to some of the debt over the years by just creating ad-hoc ops to expand broadcasts in downstream projects (I think we finally excised one from IREE not too long ago), and part of the value of having TOSA in-tree is that it is a forcing function to create some of these things that never quite had the critical mass to do right in an out of tree project.

sjarus · November 17, 2020, 11:24pm

Agreed. We’re in favor of not making ad hoc choices where a better option is preferable. As in the case of how TOSA carries numerical formulation as a fully defined solution for quantized types in-op, we’d like to make right choices with interfacing concerns like broadcasting, dynamic shapes and other topics like the ones listed in this thread, and would be happy to discuss them further.

jsmolens · November 17, 2020, 11:25pm

The reference model is designed to validate a subgraph of TOSA operators (e.g., legal operands/attributes/outputs/datatypes), then read input tensor data, evaluate the network, and produce output tensors. The model doesn’t expose kernels directly, but if mlir-cpu-runner can express its output in terms of TOSA operators and input tensor data, it could run through the reference model today.

We can certainly discuss these needs further.

_sean_silva · November 18, 2020, 12:46am

I would love to contribute to this.

From my experience in npcomp, if we open the door to the “size 1 broadcasting” degenerate case for dynamic shapes, we’ll be in for a very hard time because

it cannot be lowered to linalg in general (it creates internal aliasing which breaks transformations).
the hacky way of handling this that is floating around (the “stride 0 trick”) isn’t even expressible on linalg-on-tensors which is what we should be targeting with these lowerings

stellaraccident · November 18, 2020, 1:35am

Maybe not urgently, but I imagine if we had a public meeting about shape legalization with TOSA as a case study, we could fill an hour, and probably flush a lot of tribal knowledge out in the process.

stellaraccident · November 26, 2020, 2:57am

We now have both the dialect in the MLIR repo and lowerings from TFLite/TF in the tensorflow repo. There are still a number of cleanups and discussions to be had but this is approaching the point to talk about next steps. We should plan to sync on next steps for lowerings some time next week (after the US holidays).

Topic		Replies	Views
[RFC] TOSA Dialect in MLIR MLIR	38	7805	November 11, 2020
Did I find a bug? Converting TOSA to LinAlg MLIR	8	706	July 16, 2021
[Tosa][tosa-to-linalg] Code design problem of Tosa to Linalg conversion MLIR	6	563	November 30, 2022
[RFC] TOSA-to-Linalg lowering of element-wise ops MLIR	9	904	October 3, 2023
TOSA lower to Linalg/Tensor/Arith failed(tosa.reshape/Conv2d) MLIR	3	464	July 27, 2023

TOSA to LinAlg lowerings and legalizations

Related topics