[RFC] Interfaces and dialects for precise IR transformation control

Many transformations on higher-level ops have multiple parameters and, according to recent research, can benefit from a precise per-op or per-op-class configuration of these parameters. Traditionally however, these transformations are controlled by passes heuristics with few control knobs.

We propose a mechanism for more precise control over IR transformation by reifying the sequence of IR transformation instructions as more IR.

Transform commands as IR

Each transformation corresponds to an operation in a separate piece of IR, referred to as Transform IR. Attributes of Transform IR operations specify the parameters of the transformation. These operations can define and use values that serve as handles to operations in the IR being transformed (or Payload IR). Such handle values allow for precisely targeting a transformation on an operation or a set of operations.

At a high level, Transform IR operations are expected to implement an interface along the following lines:

class TransformOpInterface {
  virtual LogicalResult apply(TransformState &state) = 0;
};

/// Contains the mapping between handle values used in Transform IR and lists of
/// corresponding Payload IR ops.
struct TransformState {
  /// Must be called to indicate that the handle values defined by the Transform IR op
  /// correspond to the list of Payload IR ops.
  void setPayload(OpResult handle, ArrayRef<Operation *> payloadIROps);

  /// Returns the list of Payload IR ops associated with the handle value 
  /// used in Transform IR.
  ArrayRef<Operation *> getPayload(Value operand) const;
};

The simplest way to use such Transform IR operations is to create a “schedule interpreter” that calls “apply” following some predefined rules, e.g., sequentially. This is similar to the approach adopted by Halide, TVM, etc.

In addition to the interface, a trait can provide the apply implementation for the common case of a transformation accepting one handle and producing another handle.

class SingleOpTransformOpTrait : public OpTrait<...> {
  LogicalResult apply(TransformState &state) {
    ArrayRef<Operation *> payload = state.getPayload(getOperation()->getOperand(0));
    SmallVector<Operation *> results = to_vector(map_range(
      payload,
      [](Operation *payload) {
        return static_cast<ConcreteOp *>(this)->applyToOne(payload);
      }));
    state.setPayload(results);
    return success(/*all-results-are-non-null*/);
  }
};

Similar traits can be provided for other common cases on a per-need basis.

Extension mechanism

TransformationState can be extended to store more information than the mapping or to get notified when the mapping changes by attaching instances of the state extension class.

class TransformOpInterface {
  class Extension {
    /// Derived classes can have non-trivially-destructible members.
    virtual ~Extension();
    /// Derived classes can override this to get notified when the payload changes.
    virtual void notifySetPayload(Value handle, ArrayRef<Operation *> payloadIROps);
  };

  /// Adds a new extension of the given type to the state.
  /// Practically this constructs the extension from arguments.
  template <typename ExtTy, typename... Args>
  ExtTy &addExtension(Args &&... args);

  /// Returns the extension of the given type if present in the state or null.
  template <typename ExtTy>
  ExtTy *getExtensionOrNull();
};

Extensions are identified by TypeID and only one extension of a given type is allowed in the state. Transform IR operations can interact with extensions at will and are allowed to fail if the required extensions are missing.

Relation to the infrastructure

PDL is expected to be used to match the Payload IR ops to be transformed. Furthermore, !pdl.operation can be used, at least initially, as a handle type. More specifically, we expect to introduce a transform op that takes as attribute the symbol reference to a PDL pattern and defines a handle

pdl.pattern @pdl_pattern {
  %0 = operation "scf.for"
  // …
  rewrite %0 with @xform
}
%0 = xform.pdl_match @pdl_pattern
%1 = xform.tile %0 {tile_sizes = [32, 32]}

The recently proposed dialect extension mechanism can be used to reduce coupling between the dialect that defines Transform IR ops and the implementations of those transforms by leveraging external models for the transform op interface.

This mechanism is complementary to passes: a pass may run a “schedule interpreter”, individual transformations can create and run (nested) pass managers. A simple sequential “interpreter” test pass will be defined for testing purposes.

Alternatives considered

Finer-grained passes with multiple options

One could consider writing finer-grain passes that perform a specific transformation on a highly specific shape of the IR. It sounds extremely difficult to achieve the same level of both expressivity and verification with pass options as with PDL + custom verifiers on ops. Furthermore, it would be challenging to chain transformations, i.e., have a transformation apply on the ops produced by the previous one or have its properties depend on the success of the previous transformations, as there is no mechanism to communicate between passes.

Attributes

One could consider attaching discardable attributes to operations as means to communicate between transformations. As their name indicates, such attributes can be discarded at any moment and are not reliable for anything but hints. Some enabling transformations such as canonicalization or CSE do drop attributes.

Driving everything from PDL

Another possibility is to drive the transformation using custom rewriters from PDL, where the custom rewriter would play the role of the apply interface method. This has similar issues with chaining and verification as with passes.

Layering

The current plan is to create a new dialect, xform or transform, that would contain the interface and the basic operations for bootstrapping the Transform IR such as pdl_match, container ops, utilities for executing the transforms, and potentially Transform IR ops for some general transformations such as canonicalization and folding. This is similar to how the bufferization dialect is organized.

Dialect-specific transformations can be implemented by their respective dialects, e.g., with ops prefixed by .transform such as scf.transform.tile and with external / promised interface mechanism to decouple transformation implementation from the op definition. Cross-dialect yet non-generic transforms can follow the same approach as cross-dialect canonicalization patterns (while this may be challenging, solving the layering issue for cross-dialect canonicalization patterns is not in scope of this RFC and the controllable transform mechanism should align on any change proposed for cross-dialect canonicalization).

Other proposals for layering and naming are welcome!

7 Likes

This is interesting, but without examples I’ll have a hard time to really make a mental model out of it.
Seems like it would be worth a public presentation, should we schedule an ODM on this?

A vaguely related CIRCT concept is LoweringOptions.h which is an IR-embedded interface that affects lowering. It provides a standardized (in circt) way to control for target-specific lowerings etc by declaratively capturing the target properties. The infra then sets up the pass manager and the passes configure themselves (sometimes turning into noops) depending on its configuration.

It might be worth checking out. One nice thing about it is that it is very simple and not invasive on the infra.

Another related concept is IREE’s LoweringConfig.h, which functions in a similar way to the mechanism in CIRCT (except that it can also apply at different levels of the IR).

I am familiar with the work presented in this RFC largely in the context of it being considered as a way to make this mechanism more flexible, and it pushes into areas that the simpler mechanism has trouble expressing. @ftynse I know there was significant discussion and design work that went in to arriving at the formulation you are proposing. Would it be useful to educate and walk people more through the thought process and motivations that led you here? +1 to an ODM, as Mehdi suggests.

I am curious whether the referenced solutions have a good way of dealing with SSA values, as IR get transformed ?

One analogy I can think of to help frame the proposal relates to “current MLIR types and attributes” vs “dependent-types and attributes that can take an SSA value”.

The ability to track operations as (PDL) SSA-values as the transformations get applied, has a dependent-type flavor.

This is also akin to injecting static information in the IR as we apply transformations.
assume-like operations could technically encode part of the information but seem far from ideal.

I can do a short ODM.

It looks like these are more or less a reification of pass options into IR. Arguably, both could have been implemented using the data layout and target info infrastructure, but this is not the point here. So the answer to your question is no, there doesn’t seem to be any way to deal with SSA values or even apply transformations at a finer grain than a blanket pass. As an answer to a potential question: both of these things seem trivially expressible as transform ops.

Since Mehdi asked for an example, here’s something I would like to be able to express:

// Payload IR
scf.for %i = 0 to 256 step 1 {
  //...
}
scf.for %j = 0 to 32 step 1 {
  // ...
}
// Other loops may follow.

// Match only the loops known to be large enough, here we match only %i.
%0 = xform.match @scf_for_loops_with_iter_count_at_least_100
// Tile the loop and get handles to outer and inner loops.
%tile, %point = scf.transform.tile %0 {sizes = [32]}
// Vectorize only the inner loops produced by tiling.
%2 = scf.transform.vectorize %point

// Unroll the other (non-vectorized) loops.
%3 = xform.match @any_scf_for_with_non_vector_ir_inside
scf.transform.unroll %3

There is infinite flexibility in what is being transformed since xform.match refers to a PDL pattern that can match anything.

4 Likes

+1. This is a great evolution!

Are other users expected to write their own interpreter?

Thanks Alex! I’m looking forward to the ODM. I’ll save most comments until after that, so that I can have a better understanding of what the full goal is here.

This part feels a bit weird. Have you considered just have a registry of “named” transforms that gets populated by the extensions? What is the tradeoff of having each dialect define their own transform operations, vs. having something like:

pdl.pattern @pdl_pattern {
  %0 = operation "scf.for"
  // …
  rewrite %0 with @xform
}
%0 = xform.pdl_match @pdl_pattern
%1 = xform.execute "scf.tile"(%0) {tile_sizes = [32, 32]}

– River

I expect an approach similar to conversions: there will be an applyDeclarativeTransforms utility function that may be called from a pass one or more times. The test pass is intended to test the utility function.

Not in this form, but I did think about higher-level “transform recipes” that can be provided in this way along with PDL patterns that rewrite them into primitive operations. Something along the lines of:

%0 = xform.pdl_match @pdl_pattern
%1 = xform.recipe "double_tiling"(%0) { sizes1 = [32, 32], sizes2 = [8, 4] }

// This pattern describes how to unpack the "recipe" above.
pdl.pattern @double_tiling {
  %name = attribute : "double_tiling" 
  %sizes1 = attribute
  %sizes2 = attribute
  %target = operand
  %op = operation(%target : !pdl.value) { "sizes1" = %sizes1, "sizes2" = %sizes2 }
  rewrite %op {
    %tile1 = operation "scf.transform.tile"(%target) { "tile_sizes" = %sizes1 }
    %tile2 = operation "scf.transform.tile"(%tile1) { "tile_sizes" = %sizes2 }
    replace %op with (%tile2 : !pdl.value)
  }
}

My main concern with this approach is that it essentially duplicates the op definition mechanism. With proper named transform ops we can rely on the infra to verify validity of at least the single transform, e.g., that "tile" is a valid name, that it accepts one handle (in a more advanced version, we could have typed handles and require the handle to point to a loop) and returns two, that it accepts an ArrayAttr containing integers, etc. With a single execute op, we would have to inject all this logic into its verifier and potentially printer/parser through some custom “subop” registration mechanism without benefitting from ODS. Scaling this up a bit, we may want to reason about transformations as ops, e.g., canonicalize away known noop transformations (tile by 0, unroll by 1) or their combinations (distribute then fuse back, tile then coalesce), etc. Having proper first-class ops is one of the key points of the proposal, I want to express transforms as proper IR, not JSON :slight_smile: My secondary concern is making sure the “xform” dialect doesn’t need to depend on everything.

I do understand the concern with transform ops present in the dialect and being not really related to the scope of the dialect itself. It would be nice to find a good home for them. The options seem to be:

  • the “main” dialect being transformed; this may have a layering and scope problem;
  • the “xform” dialect; this makes the “xform” dialect depend on world and doesn’t seem to scale to out-of-tree projects;
  • a separate “xform_dilaectname” dialect that mirrors the payload dialect; we may be creating too many dialects with this.

Would it be possible to make the transform dialect pluggable? It defines a core set of operations but transform operations can be defined outside (but still belonging to the transform dialect) and injected into the dialect.

Also, if an injected transform dialect op manipulates, for example, SCF ops, I don’t think that makes the dialect dependent on SCF but the pass that executes the transformation, since the transform IR itself is only dependent on PDL for the types.

I have hard time seeing this work with out-of-tree projects. Beyond requiring extra core IR work, which none of the other options do, this may create a dangerous pattern. Would we allow injecting ops into the LLVM dialect? Builtin?

I guess a major thing I’m missing here is how this is intended to be used. What are the expected integration points? How are users intended to interact with this?

What types of transformations are intended to be captured? The examples above have dialect specific transformations, but what about “generic” transformations? Ideally we would want various transformations to be driven by interfaces, not specific ops. Would the suggestion be each dialect would have the same transformation op defined somewhere (scf.transform.unroll/affine.transform.unroll/etc.)?

Are there any limitations on what the transforms can do? Can these transforms run nested passes? Does this infra need some form of “listener” like rewrite patterns do?

I’m not sure this would actually require much additional core IR work, if any really (at least as far as I can tell). I also would separate out dialects that are intended to be extensible and those that aren’t. We already have some dialects that allow unregistered attributes/operations/types, and downstream projects can allow attributes/operations/types to be selectively registered. It’s up to the specific dialect to decide how it approaches extensibility.

Dynamically extending a “transform” dialect could actually be an interesting design space here, given that the extensions for the various clients would simply register the additional operations. That would actually remove a lot of the problems with defining and registering N transform dialects, would easily handle inter-dialect transformations, etc. I’m not entirely sure this is the path that we would go down, but I would at least give it consideration.

– River

The precedent for extending a dialect’s opset comes from the TensorFlow dialect, which provides an additional op registration hook. I believe this solution is “minimally intrusive” as opposed to registering transformations as strings, which would be more “non-intrusive”. In terms of layering, the extension ops could potentially live under MyDialect/Transform. Generic transformations can live under lib/Transform, for example.

+1 to the idea of treating sequence failures like pass failures.

What is the pdl_match performed on? I think this operation should accept a !pdl.operation to match, e.g. a module or function. Then the main piece of IR to transform is fed into the sequence as an argument.

Do you have a pointer to the code for TF dialect? (If not, I’ll find it).

TensorFlow has a specific registry and hook, but that’s out of band from MLIR point of view as far as I know: ops are still registered in the constructor and not injected after the fact right now.

Nothing really happens to the dialect after initialization. I don’t see why addOperations couldn’t be called after the fact (other than being careful about threading). If there is something that prevents or otherwise makes difficult the dynamic registration of operations, I’m sure it’s something that can be sorted out.

Thanks @ftynse for the sleek design for a very desirable feature.

I’m not the one to argue about how to solve the layering/dependences issues, but I hope a satisfactory solution can be found, possibly through a few rounds of experimentation. Now, notice xform also helps simplifying dependence issues, by decoupling the heavy-lifting of synthesizing “transformation schedules” from the application of these schedules. Thanks to the expressiveness of PDL, it is possible to generalize Halide/TVM schedules into custom passes with heuristic-batteries included. I hope to see generically applicable optimization or lowering recipes expressed using xform and reused across domains.

I’m curious about discardable attributes also. A systematic way of naming these would help when assembling xform rules.

This does not have to be included into the infrastructure, but it sounds like you can just reuse the Symbol/SymbolTable logic to guarantee unique names + symbol nesting for regions. This can be done by a particular client of the transform interfaces that assigns attribute names and then matches them using a trivial PDL snippet.

1 Like

I actually tried making addOperations public and it seems to work for injecting more operations into an existing dialect. Extra support will be necessary in registration since addOperation needs a Dialect * and we don’t necessarily want to force-load the transform dialect. Some stricter API like introducing an ExternalOperation and making sure it can only be injected into a specific dialect that moreover declares itself as suitable for injection is likely desired.

However, this hits the recurrent design issue: depending on what is registered at a specific point in time, the IR may or may not parse. Arguably, this is no different than not registering the dialect at all and having the parser complain. Just make the error message sufficiently clear. This is also similar to the extensible dialects that define new operations at runtime.

Any thoughts on this direction?