[RFC] TOSA Dialect Increment to v1.0

sjarus · December 17, 2024, 12:00am

The following RFC lists incremental updates to the TOSA MLIR dialect to align with the latest specification version v1.0. The v1.0 construct is characterized by a set of unique design goals:

This first major version of TOSA defines a backwards compatibility baseline. All future minor versions will be backwards compatible with respect to this IR.
Backwards compatibility is intended to be implemented along the ingress to and egress of TOSA within MLIR.
Operator constructs have been updated to support training and shape dynamism related expressiveness requirements.

ArgMax

The tosa.argmax operator now adds an Attribute nan_mode that defines whether NaNs are propagated or not, as defined in the specification.

%output = tosa.argmax %input {axis = 1 : i32, nan_mode = “PROPAGATE” } : (tensor<4x8xf32>) -> tensor<4xi32>

Functional changes: Adds support for NaN handling.

AvgPool

The tosa.avg_pool2d op changes the input_zp and output_zp to be Values rather than Attributes. The quantization_attr construct has been eliminated.

%output = tosa.avg_pool2d %input, %in_zp, %out_zp {acc_type = i32, kernel = array<i64: 2, 2>, pad = array<i64: 0, 1, 0, 1>, stride = array<i64: 1, 1>} : (tensor<4x32x32x8xi8>, tensor<1xi8>, tensor<1xi8>) -> tensor<4x32x32x8xi8>

Functional changes: None.

Conv2D, Conv3D, Depthwise_Conv2D

The tosa.conv2d, tosa.conv3d and tosa.depthwise_conv2d ops have two changes:
• The input_zp and output_zp are now Values, as in avg_pool2d. The quantization_attr construct has been eliminated.
• A new acc_type parameter is added.

%output = tosa.conv2d %input, %kernel, %bias, %in_zp, %out_zp {acc_type = i32, pad = array<i64: 1, 1, 1, 1>, stride = array<i64: 1, 1>, dilation = array<i64: 1, 1>} : (tensor<4x32x32x8xi8>, tensor<16x3x3x8>, tensor<16xf32>, tensor<1xi8>, tensor<1xi8>) -> tensor<4x32x23x16xi8>

%output = tosa.conv3d %input, %kernel, %bias, %in_zp, %out_zp {acc_type = i32, pad = array<i64: 1, 1, 1, 1, 1, 1>, stride = array<i64: 1, 1, 1>, dilation = array<i64: 1, 1, 1>} : (tensor<2x2x8x8x2xi8>, tensor<2x3x3x2x4xi8>, tensor<4xf32>, tensor<1xi8>, tensor<1xi8>) -> tensor<2x2x8x8x4xi8>

%output = tosa.depthwise_conv2d %input, %kernel, 5bias, %in_zp, %out_zp {acc_type = i32, pad = array<i64: 1, 1, 1, 1>, stride = array<i64: 2, 2>, dilation = array<i64: 1, 1>} : (tensor<1x32x32x8xi8>, tensor<1x3x3x16xi8>, tensor<16xf32>, tensor<1xi8>, tensor<1xi8>) -> tensor<1x15x15x16xi8>

Functional changes: Adds acc_type accumulator size control for implementation dependent parameterization.

Transpose_Conv2D

The tosa.transpose_conv2d op has three changes. In addition to the same two changes as for conv2d above, it removes the out_shape parameter which instead is derived from the output tensor or from shape inference.

%output = tosa.transpose_conv2d %input, %kernel, %bias, %in_zp, %out_zp {acc_type = i32, pad = array<i64: 1, 1, 1, 1>, stride = array<i64: 2, 2>, dilation = array<i64: 1, 1>} : (tensor<1x32x32x8xi8>, tensor<1x3x3x16xi8>, tensor<16xf32>, tensor<1xi8>, tensor<1xi8>) -> tensor<1x15x15x16xi8>

Functional changes: Adds acc_type accumulator size control for implementation dependent parameterization. Removes out_shape.

MaxPool

The tosa.max_pool2d operator adds a nan_mode parameter .

%output = tosa.max_pool2d %input {kernel = array<i64: 1, 1>, pad = array<i64: 0, 0, 0, 0>, stride = array<i64: 1, 1>, nan_mode = “PROPAGATE”} : (tensor<1x32x32x8xf32>) -> tensor<1x32x32x8xf32>

Functional changes: Adds support for NaN handling.

MatMul

The tosa.matmul operator now has the a_zp and b_zp parameters as Values rather than Attributes. The quantization_attr construct has been eliminated.

%output = tosa.matmul %a, %b, %a_zp, %b_zp : (tensor<1x8x16xi8>, tensor<1x16x32xi8>, tensor<1xi8>, tensor<1xi8>) -> tensor<1x8x32xi32>

Functional changes: None

FullyConnected

The tosa.fully_connected operator has been deprecated. Existing legalizations replace it with Conv2D or MatMul.

Clamp

The tosa.clamp operator has the following changes [LINK URL]:
• The min_fp/min_int and max_fp/max_int pairs have been replaced by metatypes min_val and max_val .
• A nan_mode parameter has been added.

%output = tosa.clamp %input {min_val = 0.0 : f32, max_val = 1.0: f32, nan_mode = “PROPAGATE”} : (tensor<4x8xf32>) -> tensor<4x8xf32>

Functional changes: min/max specified by single values with type inference. Adds support for NaN handling.

Maximum, Minimum

The tosa.maximum and tosa.minimum ops both add a new Attribute nan_mode that defines handling of NaN values.

%output = tosa.maximum %input1, %input2 {nan_mode = “PROPAGATE”} : (tensor<4x8xf32>, tensor<4x8xf32>) -> tensor<4x8xf32>
%output = tosa.minimum %input1, %input2 {nan_mode = “PROPAGATE”} : (tensor<*xf32>, tensor<4x8xf32>) -> tensor<4x8xf32>

Functional changes: Adds support for NaN handling.

Negate

The tosa.negate operator now defines the input_zp and output_zp as Values rather than Attributes. The quantization_attr construct has been eliminated.

%output = tosa.negate %input, %input_zp, %output_zp : (tensor<4x4xi8>, tensor<1xi8>, tensor<1xi8>) -> tensor<4x4xi8>

Functional changes: None

Pad

The tosa.pad operator eliminates the input_zp quantization_attr attribute. The quantization_attr construct has been eliminated as a result. The zero point value is expected to be passed to the pad_const which is a Value.

The padding parameter is now a TosaShape tensor Value. This is part of a blanket update where all references to shape are now expressed using a TosaShape tensor.

%output = tosa.pad %input, %padding, %pad_const : (tensor<4x4x32xf32>, !tosa.shape<6>, tensor<f32>) -> tensor<5x5x32xf32>

Functional changes pad_const now overloads the zero point application into its quantized implementation. The compiler is expected to implement this.

Reshape

The tosa.reshape operator modifies the shape parameter from an integer tuple to a TosaShape tensor Value. This is intended to enable support for dynamic shapes.

%output = tosa.reshape %input, %shape : (tensor<13x21x3xi1>, !tosa.shape<2>) -> tensor<1x819xi1>

Functional changes: None

Slice

The tosa.slice operator modifies both the start and size parameters. Instead of an integer tuple, they are both TosaShape tensor Values. This is intended to enable support for dynamic shapes.

%output = tosa.slice %input, %start, %size : (tensor<13x21x3xf32>, !tosa.shape<3>, !tosa.shape<3>) -> tensor<7x11x1xf32>

Functional changes: None

Tile

The tosa.tile operator modifies the multiples parameter from an integer tuple to a TosaShape tensor Value. This is intended to enable support for dynamic shapes.

%output = tosa.tile %input, %multiples : (tensor<13x21x3xi1>, !tosa.shape<3>) -> tensor<39x42x3xi1>

Functional changes: None

Transpose

The tosa.transpose operator modifies the perms parameter from a Value to an integer array Attribute

%output = tosa.transpose %input, %perms : {perms = array<i64: 2, 0, 1>} (tensor<13x21x3xf32>) -> tensor<3x13x21xf32>

Functional changes: Non-const perms no longer supported.

Resize

The tosa.resize operator modifies the scale, offset and border parameters, all of which are now Values and not Attributes any longer.

%output = tosa.resize %input, %scale, %offset, %border { mode = "BILINEAR" } : (tensor<1x32x32x8xf32>, !tosa.shape<4>, !tosa.shape<2>, !tosa.shape<2>) -> tensor<1x64x64x8xf32>

Rescale

The tosa.rescale operator modifies the multiplier, shift, input_zp and output_zp which are all Values now rather than Attributes.

%output = tosa.rescale %input, %multiplier, %shift, %input_zp, %output_zp {double_round = false, per_channel = false, scale32 = true, input_unsigned = false, output_unsigned = false} : (tensor<13x21x3xu8>, tensor<1xi32>, tensor<1xi8>. tensor<1xi8>, tensor<1xi8>) -> tensor<13x21x3xi8>

Shape Operators

A new operator tosa.const_shape is now present. This defines shape information that enables the expression of data layout operators in TOSA while also supporting further work on dynamic shape propagation.

%shape = tosa.const_shape {value = dense<[4,224,224,3]> : tensor<4xindex>} : () -> !tosa.shape<4>

We will release supporting material around the rationale for these changes - which are the result of prior feedback. Please feel welcome to suggest any additional feedback. Our current intention is to update the dialect and the ingress/egress pathways sitting in multiple framework repositories early in the new year.

Since the next TOSA community meeting falls on Dec 26, we will be cancelling it and will cover this RFC and ongoing work for community involvement and feedback, during the community meeting slot for January 2025.

sjarus · January 8, 2025, 3:32pm

We have begun updating the dialect to match the spec. The first few patches just landed, one from Arm and one contributed externally - which was great to see and something we’d like to encourage!

Several more remain, including downstream updates in the TensorFlow and Torch-MLIR repositories that generate the correct forms of the updated signatures. Once complete, we will notify here.

Jerry-Ge · January 14, 2025, 9:44pm

[Merged] Another patch to add TOSA Shape Type and Operator to the dialect:

github.com/llvm/llvm-project

[TOSA] Add Tosa_Shape type and ConstShapeOp

llvm:main ← Jerry-Ge:main

opened 11:08PM - 10 Jan 25 UTC

Jerry-Ge

+425 -33

Adds: 1. tosa shape type to Tosa dialect e.g., !tosa.shape<4> is a type for ra…nk-4 shape values (size-4 array of index values) 2. const_shape operator 3. trait TosaShapeOperator, added to tosa shape operators, and a verifier that all operands and results of operator are tosa shapes 4. trait TosaResolvableShapeOperands, added to all tosa operators, and a verifier that every tosa shape operand is produced by a tosa shape operator (indicated by trait TosaShapeOperator) 5. trait TosaShapeOperatorWithSameRanks, added to Tosa_ElementwiseShapeOp and a verifier that all operands and result shapes have same ranks 5. changed TileOp's multiples from attribute to input, of !tosa.shape type. 6. add folder for tosa ConstShape operator This patch was originally authored by Tai Ly <tai.ly@arm.com> Signed-off-by: Jerry Ge <Jerry.Ge@arm.com> Signed-off-by: Tai Ly <tai.ly@arm.com> Change-Id: I0213f99f5816b648f732b01fe8bd196956f1dfc8

Jerry-Ge · January 15, 2025, 10:34pm

Another patch to Change PadOp padding to tosa.shape based on the previous Add Tosa_Shape Type patch.

Jerry-Ge · January 22, 2025, 8:38pm

The PadOp patch has been merged LLVM upstream.

umangyadav · February 19, 2025, 2:19pm

Hi,

Can someone please explain why acc_type for Fp8 types is restricted to Fp16 ?

F8E5M2 max value can reach upto ~57k and Fp16 accumulator can only hold values upto ~65k. Therefore it may not be adequate.

@GeorgeARM

Tai78641 · April 10, 2025, 5:24pm

FYI, we are also renaming tosa operator int_div to intdiv to align with the v1.0 spec
PR: [mlir][tosa] Rename int_div to intdiv by Tai78641 · Pull Request #135080 · llvm/llvm-project

umangyadav · April 14, 2025, 1:20pm

Any updates on this one ?

@Jerry-Ge @udaya-ranga

dhernandez0 · April 14, 2025, 1:50pm

another issue I found with tosa. Taking resnet implementation from pytorch, one layer is:

layer2.0.downsample.0:

Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)

Input tensor size of this layer (from input 1x3x244x244):

torch.Size([1, 128, 28, 28])

inputSize=28, padBefore=0, padAfter=0, kernelSize=1, dilation=1, stride=2
so, following the check in tosa specs (TOSA 1.0.0 draft specification):

idivCheck(
          inputSize - 1 + padBefore + padAfter - (kernelSize - 1) * dilation,
          stride);

becomes:

idivCheck(28 - 1 + 0 + 0 - (1-1)*1, 2) -> idivCheck(27, 2) -> error_if(27 % 2 != 0) -> error

So, it looks like a layer that is part of the pytorch resnet implementation would fail the tosa specs (unless I did something wrong). I think this should be supported even if it’s not divisible by the stride?

@Jerry-Ge @udaya-ranga

OMaghiarIMG · April 14, 2025, 2:01pm

Hello,
Here’s an answer on the Tosa discourse regarding this constraint: Tosa Conv2D idiv_check constraints - TOSA - Discourse
Which basically means the paddings should be adjusted when lowering to Tosa dialect.

justin-ngo · April 15, 2025, 3:20pm

Hi @dhernandez0 and @OMaghiarIMG, I have a fix for this issue already merged in Torch-MLIR, included in this LLVM integration PR:

github.com/llvm/torch-mlir

Integrate LLVM at 6d847b1aada50d59c3e29f2e7eff779c0ee8182c

main ← vivekkhandelwal1:bump-llvm

opened 10:31AM - 04 Mar 25 UTC

vivekkhandelwal1

+333 -235

Update LLVM to https://github.com/llvm/llvm-project/commit/6d847b1aada50d59c3e29…f2e7eff779c0ee8182c TOSA Updates Summary: 1: [TOSA] Update tosa.transpose's perms as attributes Update tosa.transpose's perms to attributes according to TOSA 1.0 Update LIT tests 2: [TOSA] Rename ReduceProd to ReduceProduct Update ReduceProd name to ReduceProduct in align with TOSA 1.0 spec 3: [TOSA] Change PadOp's pad_const to rank 1 Require PadOp's pad_const to be rank 1 in align with TOSA 1.0 spec 4: [TOSA] Generate correct parameters for conv ops Bug fix: According to TOSA spec (https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d), full height and width values must be divisible by stride values. This commit adds checks for that divisibility and modifies pad values and/or slices input to generate TOSA-compliant parameters for convolution ops. 5: [TOSA] Update LIT tests and e2e results --------- Signed-off-by: Vivek Khandelwal <vivekkhandelwal1424@gmail.com> Co-authored-by: Justin Ngo <justin.ngo@arm.com>

I have another fix open for a similar issue for AvgPool2d and MaxPool2d:

In the meantime, can you check if you already have my fix included in your local and let me know if it fixes your problem please?

sjarus · April 15, 2025, 11:22pm

At this point, the TOSA 1.0 increment work is functionally in place.

The TensorFlow Lite to TOSA implementation in tensorflow/tensorflow/compiler/mlir/tosa at master · tensorflow/tensorflow · GitHub as well as Torch to TOSA one in torch-mlir/lib/Conversion/TorchToTosa at main · llvm/torch-mlir · GitHub are also up to date.

Please feel free to let us know if you notice anything missing or broken.

Topic		Replies	Views
[RFC] TOSA Dialect in MLIR MLIR	38	7610	November 11, 2020
[RFC] TOSA-to-Linalg lowering of element-wise ops MLIR	9	834	October 3, 2023
[RFC] A Tosa Test Dialect to help move TOSA construction lib into MLIR core MLIR	3	361	April 4, 2023
[RFC] Proposal for a high-level ML dialect in MLIR MLIR	181	11513	November 3, 2023
TOSA lower to Linalg/Tensor/Arith failed(tosa.reshape/Conv2d) MLIR	3	419	July 27, 2023

Related topics