[RFC] 'Target' dependent optimizations in TOSA

‘Target’ refers to a specific configuration or environment that implementations aim to support or comply with. In the context of TOSA, it encompasses three main aspects of the specification: Profiles, Extensions and Levels. In the near future, the specification version can also be considered.

Motivation

Currently, generated TOSA IR is checked for conformance to a target via the validation pass, e.g. --tosa-validate="profile=pro_int extension=bf16 level=8k".

However, transformations currently have no knowledge about the users intended target. This can lead to:

  • Previously conformant IR being transformed into non-conformant IR.
  • Non-conformant IR not being transformed into conformant IR.

Where problem 1 is the primary motivator for this proposal. Some motivating examples include:

Prior art

spirv MLIR dialect

Has a spirv.target_env module attribute that can specify - capabilities, extensions, resource limits, and version required for the target device. e.g.:

module attributes {
  spirv.target_env = #spirv.target_env<
    #spirv.vce<v1.3, [Int8, ...], [...]>, #spirv.resource_limits<>>
} {
  ...
}

which can be queried by transformations using “lookupTargetEnvOrDefault”, e.g.:

auto targetEnvSupportsKernelCapability = [](gpu::GPUModuleOp moduleOp) {
  Operation *gpuModule = moduleOp.getOperation();
  auto targetAttr = spirv::lookupTargetEnvOrDefault(gpuModule);
  spirv::TargetEnv targetEnv(targetAttr);
  return targetEnv.allows(spirv::Capability::Kernel);
};

Target information is attached to a GPUModule using --spirv-attach-target e.g.:

$ mlir-opt --spirv-attach-target="module=spirv.* ver=v1.0 caps=Kernel" test.mlir
gpu.module @spirv_module_1 [#spirv.target<#spirv.vce<v1.0, [Kernel], []>, #spirv.resource_limits<>>] {...}

DLTI dialect

DLTI is a dialect in upstream MLIR that allows for the creation of device target descriptions. However, it seems to be a work in progress (happy to be corrected): Next steps on target descriptor · Issue #934 · libxsmm/tpp-mlir · GitHub .

Target information in TOSA

Proposal 1

Pass target parameters to each pass that depends on target information (the current method for the --tosa-validation pass).

Example

Based on the motivating example above: “Canonicalizing pad+conv2d into conv2d operation required the intended “level” being taken into account”.

The transformation can be pulled out of “canonicalizations”, and moved into a separate, optional, transformation pass. The pass can expose a “levels” parameter, which it will take into account while applying the transformation.

Pros

  • Fine granularity control for exposing target information
    • Exposing information globally may lead to over-dependence on the functionality / incorrect usage.
  • Separation of concerns
    • Supplying target information to transformation passes keeps the IR clean of target-specific metadata.
    • Each pass is therefore self-contained, without relying on some global state.

Cons

  • Target dependent optimizations might need to become a separate pass.
  • Duplicated parameters for each target dependent transformation pass.
    • Can create a burden on the user repeating target information each time.
    • Risk of inconsistencies.
  • Maintenance overhead when target information is changed/extended.

Proposal 2

Attach target information to the module scope. Target information is attached once, globally, and therefore is accessible to all optimization passes. Takes inspiration from the spirv.target_env attribute.

A pass, --tosa-attach-target, can be provided to attach target env to the module via command-line arguments (similar to what exists for --tosa-validate today).

Example

Based on the motivating example above: “Canonicalizing pad+conv2d into conv2d operation required the intended “level” being taken into account”.

The pad+conv2d → conv2d transformation can remain in “canonicalizations”. The transformation will be updated such that the current “level” is retrieved and the decision about whether or not to fold will be based on the value of “level”. e.g.:

module attributes {tosa.target_env = #tosa.target_env<level = none, profiles = [pro_int, pro_fp], extensions = [int4, int16]>} {
   ...
}
// Get target information from module
tosa::TargetEnv targetEnv = tosa::lookupTargetEnvOrDefault(op);

// Query capabilities
int max_kernel;
tosa::Level level = targetEnv.getLevel();
auto maxKernelSize = level.getMaxKernelSize();

// Decide whether to fold based on maxKernelSize value
...

Pros

  • All transformations immediately have the capability to query target information.
  • Single point of reference for target information.

Cons

  • Target information exposed globally which may lead to over-dependence on the functionality / incorrect usage.
  • Target information cannot change within a module.
    • If required in the future, we could consider adding target attributes at the function scope.

Any thoughts / comments on these proposals, or another proposal that wasn’t considered, would be much appreciated, thanks!

One other approach I’ve seen used (which I found interesting), is using a non-invalidating Analysis. Along with a “populate analysis” pass which takes target as argument - and reproducers/testing pipelines are indeed important to consider (I’d try out a few of these both in writing out pipelines and seeing what dumped reproducers look like when deciding). So basically each pass can query it, but it isn’t in the IR nor needs to be passed in (pro and con), but if used with nested pipelines can be restricted using that (so any pipeline that doesn’t start with the populate can’t query, your outer most pipeline would not have it, but nested ones could/would and so be scoped). Then the string could also just be indexing into a richer in memory C++ struct.

Well another option, come to think of it, is dynamic pipelines, so the target dependent parts are dynamic pipelines where target is given once and then pipeline populated based on that.

Is the main concern that you’d want it to be explicit which passes uses the target? Avoid accidentally making things target dependent.

Re code duplication: One can define multiple passes in the same file, it can reuse patterns, functions etc. So there are plain C++ means to address code duplication side. But those would still have target duplication indeed.

(I should add, “typed not tested” :slight_smile: I’ve not seen folks use this in nested fashion)

Thanks @jpienaar!

Is the main concern that you’d want it to be explicit which passes uses the target? Avoid accidentally making things target dependent.

With regards to proposal 2, the concern was more about allowing bad design patterns to creep in. It becomes easy to slip in target-specific logic where it shouldn’t belong. Maybe I’m at risk of over-thinking, since code review should pick this up as well.

However, allowing access to target information globally will allow us to query target information in places it might be difficult to propagate that information otherwise e.g. canonicalizations. Though perhaps this isn’t desirable?

Re code duplication: One can define multiple passes in the same file, it can reuse patterns, functions etc. So there are plain C++ means to address code duplication side. But those would still have target duplication indeed.

Yes, I was mostly thinking about the need to pass the same target parameters to each target dependent pass e.g. --tosa-transform-1=”profiles/extensions” –tosa-transform-2=”level” –tosa-transform-2=”level/profiles/extensions” … -–tosa-validate=”level/profiles/extensions”.

Another thought (which I forgot to mention in the RFC above) is that centralising target information can help checks for incompatibilities between target features, for example, specification version and specified extensions. This is likely more difficult to enforce in proposal 1.

Thanks for the other proposals:

One other approach I’ve seen used (which I found interesting), using a non-invalidating Analysis …

I hadn’t thought of this and would certainly like to give it a try! One immediate question I have is about the trade-off of storing target information directly in the IR vs independently? My initial thinking is that having target information in the IR would aide debugging (functionality of a transformation depends solely on the given IR itself) and reproducibility (MLIR file is self-contained).

Well another option, come to think of it, is dynamic pipelines, so the target dependent parts are dynamic pipelines where target is given once and then pipeline populated based on that.

Is the suggestion here to make target dependent decisions at a pass/pipeline level, rather than within a pass itself? I wonder if this will lead to the existence of a number of passes performing similar tasks, where a target dependent choice needs to be made e.g. --tosa-fold-pad-op-to-tensor-op-level-none and --tosa-fold-pad-op-to-tensor-op-level-8k. Though perhaps I’ve misunderstood.

Created a draft implementation for proposal 2: [mlir][tosa] Add the concept of a TOSA target environment by lhutton1 · Pull Request #153771 · llvm/llvm-project · GitHub