[RFC] TOSA Dialect Increment to v1.0

The following RFC lists incremental updates to the TOSA MLIR dialect to align with the latest specification version v1.0. The v1.0 construct is characterized by a set of unique design goals:

  • This first major version of TOSA defines a backwards compatibility baseline. All future minor versions will be backwards compatible with respect to this IR.
  • Backwards compatibility is intended to be implemented along the ingress to and egress of TOSA within MLIR.
  • Operator constructs have been updated to support training and shape dynamism related expressiveness requirements.

ArgMax

The tosa.argmax operator now adds an Attribute nan_mode that defines whether NaNs are propagated or not, as defined in the specification.

%output = tosa.argmax %input {axis = 1 : i32, nan_mode = “PROPAGATE” } : (tensor<4x8xf32>) -> tensor<4xi32>

Functional changes: Adds support for NaN handling.

AvgPool

The tosa.avg_pool2d op changes the input_zp and output_zp to be Values rather than Attributes. The quantization_attr construct has been eliminated.

%output = tosa.avg_pool2d %input, %in_zp, %out_zp {acc_type = i32, kernel = array<i64: 2, 2>, pad = array<i64: 0, 1, 0, 1>, stride = array<i64: 1, 1>} : (tensor<4x32x32x8xi8>, tensor<1xi8>, tensor<1xi8>) -> tensor<4x32x32x8xi8>

Functional changes: None.

Conv2D, Conv3D, Depthwise_Conv2D

The tosa.conv2d, tosa.conv3d and tosa.depthwise_conv2d ops have two changes:
• The input_zp and output_zp are now Values, as in avg_pool2d. The quantization_attr construct has been eliminated.
• A new acc_type parameter is added.

%output = tosa.conv2d %input, %kernel, %bias, %in_zp, %out_zp {acc_type = i32, pad = array<i64: 1, 1, 1, 1>, stride = array<i64: 1, 1>, dilation = array<i64: 1, 1>} : (tensor<4x32x32x8xi8>, tensor<16x3x3x8>, tensor<16xf32>, tensor<1xi8>, tensor<1xi8>) -> tensor<4x32x23x16xi8>

%output = tosa.conv3d %input, %kernel, %bias, %in_zp, %out_zp {acc_type = i32, pad = array<i64: 1, 1, 1, 1, 1, 1>, stride = array<i64: 1, 1, 1>, dilation = array<i64: 1, 1, 1>} : (tensor<2x2x8x8x2xi8>, tensor<2x3x3x2x4xi8>, tensor<4xf32>, tensor<1xi8>, tensor<1xi8>) -> tensor<2x2x8x8x4xi8>

%output = tosa.depthwise_conv2d %input, %kernel, 5bias, %in_zp, %out_zp {acc_type = i32, pad = array<i64: 1, 1, 1, 1>, stride = array<i64: 2, 2>, dilation = array<i64: 1, 1>} : (tensor<1x32x32x8xi8>, tensor<1x3x3x16xi8>, tensor<16xf32>, tensor<1xi8>, tensor<1xi8>) -> tensor<1x15x15x16xi8>

Functional changes: Adds acc_type accumulator size control for implementation dependent parameterization.

Transpose_Conv2D

The tosa.transpose_conv2d op has three changes. In addition to the same two changes as for conv2d above, it removes the out_shape parameter which instead is derived from the output tensor or from shape inference.

%output = tosa.transpose_conv2d %input, %kernel, %bias, %in_zp, %out_zp {acc_type = i32, pad = array<i64: 1, 1, 1, 1>, stride = array<i64: 2, 2>, dilation = array<i64: 1, 1>} : (tensor<1x32x32x8xi8>, tensor<1x3x3x16xi8>, tensor<16xf32>, tensor<1xi8>, tensor<1xi8>) -> tensor<1x15x15x16xi8>

Functional changes: Adds acc_type accumulator size control for implementation dependent parameterization. Removes out_shape.

MaxPool

The tosa.max_pool2d operator adds a nan_mode parameter .

%output = tosa.max_pool2d %input {kernel = array<i64: 1, 1>, pad = array<i64: 0, 0, 0, 0>, stride = array<i64: 1, 1>, nan_mode = “PROPAGATE”} : (tensor<1x32x32x8xf32>) -> tensor<1x32x32x8xf32>

Functional changes: Adds support for NaN handling.

MatMul

The tosa.matmul operator now has the a_zp and b_zp parameters as Values rather than Attributes. The quantization_attr construct has been eliminated.

%output = tosa.matmul %a, %b, %a_zp, %b_zp : (tensor<1x8x16xi8>, tensor<1x16x32xi8>, tensor<1xi8>, tensor<1xi8>) -> tensor<1x8x32xi32>

Functional changes: None

FullyConnected

The tosa.fully_connected operator has been deprecated. Existing legalizations replace it with Conv2D or MatMul.

Clamp

The tosa.clamp operator has the following changes [LINK URL]:
• The min_fp/min_int and max_fp/max_int pairs have been replaced by metatypes min_val and max_val .
• A nan_mode parameter has been added.

%output = tosa.clamp %input {min_val = 0.0 : f32, max_val = 1.0: f32, nan_mode = “PROPAGATE”} : (tensor<4x8xf32>) -> tensor<4x8xf32>

Functional changes: min/max specified by single values with type inference. Adds support for NaN handling.

Maximum, Minimum

The tosa.maximum and tosa.minimum ops both add a new Attribute nan_mode that defines handling of NaN values.

%output = tosa.maximum %input1, %input2 {nan_mode = “PROPAGATE”} : (tensor<4x8xf32>, tensor<4x8xf32>) -> tensor<4x8xf32>
%output = tosa.minimum %input1, %input2 {nan_mode = “PROPAGATE”} : (tensor<*xf32>, tensor<4x8xf32>) -> tensor<4x8xf32>

Functional changes: Adds support for NaN handling.

Negate

The tosa.negate operator now defines the input_zp and output_zp as Values rather than Attributes. The quantization_attr construct has been eliminated.

%output = tosa.negate %input, %input_zp, %output_zp : (tensor<4x4xi8>, tensor<1xi8>, tensor<1xi8>) -> tensor<4x4xi8>

Functional changes: None

Pad

The tosa.pad operator eliminates the input_zp quantization_attr attribute. The quantization_attr construct has been eliminated as a result. The zero point value is expected to be passed to the pad_const which is a Value.

The padding parameter is now a TosaShape tensor Value. This is part of a blanket update where all references to shape are now expressed using a TosaShape tensor.

%output = tosa.pad %input, %padding, %pad_const : (tensor<4x4x32xf32>, !tosa.shape<6>, tensor<f32>) -> tensor<5x5x32xf32>

Functional changes pad_const now overloads the zero point application into its quantized implementation. The compiler is expected to implement this.

Reshape

The tosa.reshape operator modifies the shape parameter from an integer tuple to a TosaShape tensor Value. This is intended to enable support for dynamic shapes.

%output = tosa.reshape %input, %shape : (tensor<13x21x3xi1>, !tosa.shape<2>) -> tensor<1x819xi1>

Functional changes: None

Slice

The tosa.slice operator modifies both the start and size parameters. Instead of an integer tuple, they are both TosaShape tensor Values. This is intended to enable support for dynamic shapes.

%output = tosa.slice %input, %start, %size : (tensor<13x21x3xf32>, !tosa.shape<3>, !tosa.shape<3>) -> tensor<7x11x1xf32>

Functional changes: None

Tile

The tosa.tile operator modifies the multiples parameter from an integer tuple to a TosaShape tensor Value. This is intended to enable support for dynamic shapes.

%output = tosa.tile %input, %multiples : (tensor<13x21x3xi1>, !tosa.shape<3>) -> tensor<39x42x3xi1>

Functional changes: None

Transpose

The tosa.transpose operator modifies the perms parameter from a Value to an integer array Attribute

%output = tosa.transpose %input, %perms : {perms = array<i64: 2, 0, 1>} (tensor<13x21x3xf32>) -> tensor<3x13x21xf32>

Functional changes: Non-const perms no longer supported.

Resize

The tosa.resize operator modifies the scale, offset and border parameters, all of which are now Values and not Attributes any longer.

%output = tosa.resize %input, %scale, %offset, %border { mode = "BILINEAR" } : (tensor<1x32x32x8xf32>, !tosa.shape<4>, !tosa.shape<2>, !tosa.shape<2>) -> tensor<1x64x64x8xf32>

Rescale

The tosa.rescale operator modifies the multiplier, shift, input_zp and output_zp which are all Values now rather than Attributes.

%output = tosa.rescale %input, %multiplier, %shift, %input_zp, %output_zp {double_round = false, per_channel = false, scale32 = true, input_unsigned = false, output_unsigned = false} : (tensor<13x21x3xu8>, tensor<1xi32>, tensor<1xi8>. tensor<1xi8>, tensor<1xi8>) -> tensor<13x21x3xi8>

Shape Operators

A new operator tosa.const_shape is now present. This defines shape information that enables the expression of data layout operators in TOSA while also supporting further work on dynamic shape propagation.

%shape = tosa.const_shape {value = dense<[4,224,224,3]> : tensor<4xindex>} : () -> !tosa.shape<4>

We will release supporting material around the rationale for these changes - which are the result of prior feedback. Please feel welcome to suggest any additional feedback. Our current intention is to update the dialect and the ingress/egress pathways sitting in multiple framework repositories early in the new year.

Since the next TOSA community meeting falls on Dec 26, we will be cancelling it and will cover this RFC and ongoing work for community involvement and feedback, during the community meeting slot for January 2025.

5 Likes

We have begun updating the dialect to match the spec. The first few patches just landed, one from Arm and one contributed externally - which was great to see and something we’d like to encourage!

Several more remain, including downstream updates in the TensorFlow and Torch-MLIR repositories that generate the correct forms of the updated signatures. Once complete, we will notify here.

[Merged] Another patch to add TOSA Shape Type and Operator to the dialect:

Another patch to Change PadOp padding to tosa.shape based on the previous Add Tosa_Shape Type patch.

The PadOp patch has been merged LLVM upstream.

Hi,

Can someone please explain why acc_type for Fp8 types is restricted to Fp16 ?

F8E5M2 max value can reach upto ~57k and Fp16 accumulator can only hold values upto ~65k. Therefore it may not be adequate.

@GeorgeARM

FYI, we are also renaming tosa operator int_div to intdiv to align with the v1.0 spec
PR: [mlir][tosa] Rename int_div to intdiv by Tai78641 · Pull Request #135080 · llvm/llvm-project

Any updates on this one ?

@Jerry-Ge @udaya-ranga

another issue I found with tosa. Taking resnet implementation from pytorch, one layer is:

layer2.0.downsample.0:

Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)

Input tensor size of this layer (from input 1x3x244x244):

torch.Size([1, 128, 28, 28])

inputSize=28, padBefore=0, padAfter=0, kernelSize=1, dilation=1, stride=2
so, following the check in tosa specs (TOSA 1.0.0 draft specification):

idivCheck(
          inputSize - 1 + padBefore + padAfter - (kernelSize - 1) * dilation,
          stride);

becomes:

idivCheck(28 - 1 + 0 + 0 - (1-1)*1, 2) -> idivCheck(27, 2) -> error_if(27 % 2 != 0) -> error

So, it looks like a layer that is part of the pytorch resnet implementation would fail the tosa specs (unless I did something wrong). I think this should be supported even if it’s not divisible by the stride?

@Jerry-Ge @udaya-ranga

Hello,
Here’s an answer on the Tosa discourse regarding this constraint: Tosa Conv2D idiv_check constraints - TOSA - Discourse
Which basically means the paddings should be adjusted when lowering to Tosa dialect.

1 Like