RFC: MLIR Action, Tracing and Debugging MLIR-based Compilers

Status: landed, see the documentation is here: Action: Tracing and Debugging MLIR-based Compilers - MLIR

MLIR Actions: Tracing and Debugging MLIR-based Compilers

This work has been cooking for almost a year now, with quite a few people involved along the way, so special shootout to: @ChiaHungDuan, @jpienaar, @inkryp, @Mogball, @tlongeri (I hope I don’t forget anyone).

See the slides and the recording from the MLIR Open Meeting where I demoed this.

Introduction

In MLIR, passes and patterns are the main abstractions to encapsulate general IR transformations. The primary way of observing transformations along the way is to enable “debug printing” of the IR (e.g. -mlir-print-ir-after-all to print after each pass execution). On top of this, finer grain tracing may be available with -debug which enables more detailed logs from the transformations themselves. However, this method has some scaling issues: it is limited to a single stream of text that can be gigantic and requires tedious crawling through this log a posteriori. Iterating through multiple runs of collecting such logs and analyzing it can be very time consuming and often not very practical beyond small input programs.

This RFC proposes to build new tracing and debugging capabilities to MLIR on top of what we started with the DebugAction framework. The basic concept of the framework is to allow MLIR developers to define “Actions” to structure their transformations. As defined in the original proposal: A debug action is essentially a marker for a type of action that may be performed within the compiler. There are no constraints on the granularity of an “action”, it could be as simple as “perform this fold” and as complex as “run this pass pipeline”.

The novelty of the proposal here is that we take the concept further to offer a pluggable mechanism to register “Observers” or one “Controller” that can intercept these actions and either inspect, time, and log what is happening (the observers) or make a decision about what to do with the Action (the controller): apply it or skip it.

Some of the tracing already exists in MLIR, but has been built around the Pass infrastructure and so a “Pass” is the only real atom in the system. One way to see this new Action infrastructure may be a generalization of the tracing capabilities we have with passes, applied at a finer grain, and starting to make the infrastructure a bit more orthogonal to the concept of passes.

We implemented some prototype tools to showcase Actions:

  • Logging: a simple text log of all the actions executed in order, possibly filtering on the IR location that an action affects.
  • Tracing: an observer that can produce a Chrome trace of the execution.
  • Interactive debugging: basically this enhance a LLDB/GDB environment to enable users to pause the program before/after the execution of an action, print the status, inspect the IR, etc.

Below we explain how 1) we evolved the DebugAction API to support new capabilities, 2) implement two new Actions in key places in MLIR: pass execution and pattern application, and 3) show a bit how the three example use-cases work.

Implementation Notes

DebugActions API Update

The DebugAction is a way for the compiler to interact with a pluggable handler at runtime. A handler can be registered for a customized debug action and use it to instruct the caller to skip the execution of this action. This mechanism is updated to offer more control to the handlers. In particular, the initial design of debug actions only exposes a shouldExecute() API, which pushes the control to the user, outside of the framework visibility. Instead we are changing this API to inverse the control and put it under the handler responsibility. Right now the code looks like this:

DebugActionManager &manager = context->getDebugActionManager();
// Query the action manager to see if currentPattern should be applied to
// currentOp.
if (manager.shouldExecute<DebugAction>(currentOp) {
  // apply a transformation 
  …
}

In this sequence, the manager isn’t involved in the actual execution of the action and can’t develop rich instrumentations. Instead the API will let the control to the handler itself:

// Execute the action under the control of the manager
manager.dispatch<Action>(currentOp, [&]() {
  // apply the transformation in this callback
  ...
});

This inversion of control (by injecting a callback) allows handlers to implement potentially new interesting features: for example, snapshot the IR before and after the action, or record an action execution time. More importantly, it will allow to capture the nesting execution of actions: the handler will have access to not only information about the current action but also about the enclosing ones, similarly to how gdb can walk a backtrace for example.

An Operation* is used here to define the IR that is considered by the transformation, but recognizing that an Action can affect more than an operation, we generalize this into a context for the action, which is an array of IRUnit which is defined as:

using IRUnit = llvm::PointerUnion<Operation *, Block *, Region *>;

Dispatching an Action can now be done through the MLIRContext, for example for :

  context->dispatch<MyAction>(
      [&]() {
        // ...
      },
      {op, block});

ExecutionContext

The context allows to register a single handler for dispatched action. This handler if present takes control of the action. In the absence of a handler, the callback for the Action is directly invoked.

We implemented a handler intended to help debugging MLIR: the ExecutionContext. This component is really a pluggable orchestrator to handle Actions. The responsibilities of this component are the following:

  • Accept registration of Observers and forward them the action before and after its execution.
  • Accept registration of BreakpointManager: when an Action is dispatched the managers are queries to “match” the Action for an existing Breakpoint. If one is found the Action is then passed to the “controller”, otherwise the Action is executed.
  • When the controller returns, the ExecutionContext acts on the return value: the Action is applied, skipped, stepped in or over, or the execution continues until the end of the parent action (think GDB “finish” command).
  • Actions are chained into a stack to offer a “backtrace” mechanism.

Demo application

Action for encapsulating Pass Execution and Pattern Application

First we need to actually dispatch Actions somewhere. Pass execution and pattern application are the two main encapsulation mechanisms for transformations in MLIR. They are a natural starting point to hook them to the Action mechanism and unlock a significant number of users immediately without the need to modify any existing code.

Pattern Application

The PatternApplicator.cpp is updated to dispatch pattern application through a debug action under the control of the manager:

/// This is the type of Action that is dispatched when a pattern is applied.
/// It captures the pattern to apply on top of the usual context.
class ApplyPatternAction : public tracing::ActionImpl<ApplyPatternAction> {
public:
  using Base = tracing::ActionImpl<ApplyPatternAction>;
  ApplyPatternAction(ArrayRef<IRUnit> irUnits, const Pattern &pattern)
      : Base(irUnits), pattern(pattern) {}
  static constexpr StringLiteral tag = "apply-pattern-action";
  static constexpr StringLiteral desc =
      "Encapsulate the application of rewrite patterns";

  void print(raw_ostream &os) const override {
    os << "`" << tag << " pattern: " << pattern.getDebugName();
  }
private:
  const Pattern &pattern;
};

...

// Try to match and rewrite this pattern. The patterns are sorted by
// benefit, so if we match we can immediately rewrite. For PDL patterns, the
// match has already been performed, we just need to rewrite.
bool matched = false;
op->getContext()->dispatch<ApplyPatternAction>(
    [&]() {
      rewriter.setInsertionPoint(op);

      ...

      const auto *pattern = static_cast<const RewritePattern *>(bestPattern);
      result = pattern->matchAndRewrite(op, rewriter);
    },
    {op}, *bestPattern);
if (matched)
  break;

Pass Execution

In Pass.cpp, the PassManager executes individual passes through dispatching a debug action to the manager:

/// Encapsulate the "action" of executing a single pass, used for the MLIR
/// tracing infrastructure.
struct PassExecutionAction
    : public tracing::ActionImpl<PassExecutionAction> {
  using Base = tracing::ActionImpl<PassExecutionAction>;
  const Pass &pass;
  PassExecutionAction(ArrayRef<IRUnit> irUnits, const Pass &pass)
      : Base(irUnits), pass(pass) {}
  static constexpr StringLiteral tag = "pass-execution-action";
  void print(raw_ostream &os) const override {
    os << "" << tag << " "
       << " running \"" << pass.getName() << "\" on Operation \"";
    ArrayRef<IRUnit> irUnits = getContextIRUnits();
    if (irUnits.empty()) {
      os << "<missing?>";
    } else {
      os << irUnits.front().dyn_cast<Operation *>()->getName() << "\"";
    }
  }
};

...

bool passFailed;
op->getContext()->dispatch<PassExecutionAction>(
    [&]() {
      // Invoke the virtual runOnOperation method.
      if (auto *adaptor = dyn_cast<OpToOpPassAdaptor>(pass))
        adaptor->runOnOperation(verifyPasses);
      else
        pass->runOnOperation();
      passFailed = pass->passState->irAndPassFailed.getInt();
    },
    {op}, *pass);

The same mechanism generalizes and can be used in any kind of transformation, for example a developer writing an “inliner” transformation could implement on the model above a InlinerAction and wrap each individual inlining transformation in the same dispatch<InlinerAction>(...) as above.

Logging

We added a --log-actions-to=<path> to produce a log of the sequence of actions applied by the compiler. Below is an example of this options in action with the flang compiler:

[thread 0] begins (no breakpoint) Action `pass-execution-action`  running "CSE" on Operation "builtin.module" (module attributes {dlti.dl_spec = #dlti.dl_spec&lt;#dlti.dl_entry&lt;"dlti.endianness", "little">, #dlti.dl_entry&lt;i64, dense&lt;64> : vector&lt;2xi32>>, #dlti.dl_entry&lt;i128, dense&lt;128> : vector&lt;2xi32>>, #dlti.dl_entry&lt;i1, dense&lt;8> : vector&lt;2xi32>>, #dlti.dl_entry&lt;i8, dense&lt;8> : vector&lt;2xi32>>, #dlti.dl_entry&lt;i16, dense&lt;16> : vector&lt;2xi32>>, #dlti.dl_entry&lt;i32, dense&lt;32> : vector&lt;2xi32>>, #dlti.dl_entry&lt;f16, dense&lt;16> : vector&lt;2xi32>>, #dlti.dl_entry&lt;f64, dense&lt;64> : vector&lt;2xi32>>, #dlti.dl_entry&lt;f128, dense&lt;128> : vector&lt;2xi32>>>, fir.defaultkind = "a1c4d8i4l4r4", fir.kindmap = "", llvm.data_layout = "e-m:o-i64:64-i128:128-n32:64-S128", llvm.target_triple = "arm64-apple-macosx13.0.0"} {/*skip region4*/})
[thread 0] completed `pass-execution-action`
[thread 0] begins (no breakpoint) Action `pass-execution-action`  running "mlir::detail::OpToOpPassAdaptor" on Operation "builtin.module" (module attributes {dlti.dl_spec = #dlti.dl_spec&lt;#dlti.dl_entry&lt;"dlti.endianness", "little">, #dlti.dl_entry&lt;i64, dense&lt;64> : vector&lt;2xi32>>, #dlti.dl_entry&lt;i128, dense&lt;128> : vector&lt;2xi32>>, #dlti.dl_entry&lt;i1, dense&lt;8> : vector&lt;2xi32>>, #dlti.dl_entry&lt;i8, dense&lt;8> : vector&lt;2xi32>>, #dlti.dl_entry&lt;i16, dense&lt;16> : vector&lt;2xi32>>, #dlti.dl_entry&lt;i32, dense&lt;32> : vector&lt;2xi32>>, #dlti.dl_entry&lt;f16, dense&lt;16> : vector&lt;2xi32>>, #dlti.dl_entry&lt;f64, dense&lt;64> : vector&lt;2xi32>>, #dlti.dl_entry&lt;f128, dense&lt;128> : vector&lt;2xi32>>>, fir.defaultkind = "a1c4d8i4l4r4", fir.kindmap = "", llvm.data_layout = "e-m:o-i64:64-i128:128-n32:64-S128", llvm.target_triple = "arm64-apple-macosx13.0.0"} {/*skip region4*/})
[thread 0] begins (no breakpoint) Action `pass-execution-action`  running "ArrayValueCopy" on Operation "func.func" (func.func @_QPtest1a(%arg0: !fir.ref&lt;!fir.array&lt;10xi32>> {fir.bindc_name = "a"}, %arg1: !fir.ref&lt;!fir.array&lt;10xi32>> {fir.bindc_name = "b"}, %arg2: !fir.ref&lt;!fir.array&lt;20xi32>> {fir.bindc_name = "c"}) {/*skip region4*/})
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayFetchConversion (%18 = fir.array_fetch %12, %arg3 : (!fir.array&lt;20xi32>, index) -> i32)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayFetchConversion (%24 = fir.array_fetch %15, %23 : (!fir.array&lt;10xi32>, index) -> i32)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayUpdateConversion (%28 = fir.array_update %arg4, %27, %arg3 : (!fir.array&lt;10xi32>, i32, index) -> !fir.array&lt;10xi32>)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayLoadConversion (%1 = fir.array_load %arg0(%0) : (!fir.ref&lt;!fir.array&lt;10xi32>>, !fir.shape&lt;1>) -> !fir.array&lt;10xi32>)
[thread 0] completed `apply-pattern-action` \

Of course this can generate a lot of data, so a convenient --log-actions-filter option can be used to filter based on the location of the IR touched by the action. For example filtering on line 184 of the Fortran input file I have:

subroutine test2b(a,b,c,d)
  integer :: a(10), b(10), c(10), d(10)
  b(c(d)) = a // Line 184
end subroutine test2b

We can get only the actions actually touching this line of source code:

[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayFetchConversion (%11 = fir.array_fetch %8, %arg4 : (!fir.array&lt;10xi32>, index) -> i32)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayFetchConversion (%15 = fir.array_fetch %1, %arg4 : (!fir.array&lt;10xi32>, index) -> i32)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayFetchConversion (%21 = fir.array_fetch %4, %20 : (!fir.array&lt;10xi32>, index) -> i32)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayUpdateConversion (%27 = fir.array_update %arg5, %14, %26 : (!fir.array&lt;10xi32>, i32, index) -> !fir.array&lt;10xi32>)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayLoadConversion (%1 = fir.array_load %arg3(%0) : (!fir.ref&lt;!fir.array&lt;10xi32>>, !fir.shape&lt;1>) -> !fir.array&lt;10xi32>)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayLoadConversion (%5 = fir.array_load %arg2(%0) : (!fir.ref&lt;!fir.array&lt;10xi32>>, !fir.shape&lt;1>) -> !fir.array&lt;10xi32>)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayLoadConversion (%9 = fir.array_load %arg1(%0) : (!fir.ref&lt;!fir.array&lt;10xi32>>, !fir.shape&lt;1>) -> !fir.array&lt;10xi32>)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayLoadConversion (%11 = fir.array_load %arg0(%0) : (!fir.ref&lt;!fir.array&lt;10xi32>>, !fir.shape&lt;1>) -> !fir.array&lt;10xi32>)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayMergeStoreConversion (fir.array_merge_store %10, %14 to %arg1 : !fir.array&lt;10xi32>, !fir.array&lt;10xi32>, !fir.ref&lt;!fir.array&lt;10xi32>>)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ConvertConvertOptPattern (%22 = fir.convert %21 : (i32) -> index)
[thread 0] completed `apply-pattern-action`

Tracing

Unsurprisingly, an Action can be traced and this kind of profile generated:

Note there how you can trace in the timeline, pattern after pattern, finding where the time is spent. More importantly clicking on a given pattern will provide interesting details showing at the bottom: the operation the pattern is applied to including the source location. We may know easily from there some information like “which loop in the source code blows the fusion/tiling algorithm”.

Ideally we should make it so that profilers like Tracy could be used out-of-the-box!

Interactive Client

The last application is an interactive client à la gdb. This client supports setting up breakpoints and stepping through the execution of Actions and skipping them selectively, while inspecting the IR before and after each transformation. Breakpoints are set by using the matching capability on action tag or other properties (for example source location or specific patterns based on their debug name). This section documents the features exposed through the command line interface.

Controlling execution:

  1. “skip”: continue the execution to the next breakpoint without applying the current transformation.
  2. “apply”: continue the execution to the next breakpoint after applying the current transformation.
  3. “step”: continue the execution and stop again for the next event, including if we hit a nested action, or go to the next action if the transformation is already executed.
  4. “next”: continue the execution and stop again after the current action (not for nested action), or go to the next action if the transformation is already executed.
  5. “finish”: continue the execution and stop again when the parent action completes.
  6. “break-on-tag “<tag>”: add a breakpoint matching the provided action by tag.
  7. “break-on-file “<file:line:loc>”: add a breakpoint matching the provided action.
  8. “list”: list the active breakpoints and their identifiers.
  9. “disable [#id]”: disable a breakpoint by its identifiers, or using “all” to disable them all.
  10. “enable [#id]”: enable a breakpoint by identifiers, or using “all” to enable them all.
  11. “backtrace”: shows the current stack of actions

Inspecting the IR:

An action comes with a list of IRUnits (Operation, Block, or Region) as context. A “cursor” in the IR is available and can be controlled by the user using the following set of commands.

  1. “context” - list the available IRUnits
  2. “cursor-select-from-context #id” - activate the IRUnits based on the ID
  3. “cursor-print” - print the current activated IRUnit
  4. “cursor-parent” - activate the parent IRUnit.
  5. “cursor-child #id” - activate a child by id.
  6. “cursor-previous #id” - activate the previous IRUnit in its current list (for example previous Operation in the current block).
  7. “cursor-next #id” - activate the next IRUnit in its current list (for example next Block in the current region).

In Consideration

We may want to extend the API so that the callback can return information about whether a transformation succeeded or not, and the IRUnit that was affected: it can be the original pointer if it is still valid, a new one if changed, or nullptr if deleted.

struct ActionResult {
  IRUnit op; // handle to the update IRUnit (can be a new one)
  bool changed; // whether the IR was changed.
  LogicalResult status; // whether the transforms succeeded or not.
}
// Execute the action under the control of the manager
LogicalResult status = manager.execute<Action>(currentOp, [&]() {
  // apply the transformation in this callback
  …
  return { currentOp, /*changed=*/true, success() };
});

This ActionResult can be used by the framework to improve logging after an action is completed.

Another thing would be for an Action to also carry an array of StringRef provided by the client to model optional “instance tags", this array of tags can be filled for example with the pass name or the pattern debug name and debug labels, and used by the client for filtering more generically. The final API for the dispatch method may look like becomes:

  template <typename ActionType, typename... Args>
  LogicalResult dispatch(ArrayRef<IRUnit> units, ArrayRef<StringRef> instanceTags,
                        llvm::function_ref<ActionResult()> transform,
                        Args &&... args);
16 Likes

Nice proposal! I’m always eager to see more profiling/tracing/debugging support and tools :smiley:

I’m particularly interested in using actions to get visibility into pattern application and more core operations than just the passes that PassInstrumentation gives. In IREE we have a PassInstrumentation that hooks up Tracy instrumentation (source here, and slides from a recent presentation here), but we have to use profiler sampling to see any deeper. The demo showing apply-pattern-action and GreedyPatternRewriteIteration looks great.

Just instrumented passes:


Sampling with full callstacks - lots of noise to see what is actually running (applyPatternsAndFoldGreedily here):

One aspect of this that I’m wondering about is how much utility you can get out of actions without needing to modify the source of the upstream and any downstream project code. For code that downstream projects control, developers can have opinionated views of what is worth instrumenting, but it isn’t practical to directly instrument shared/upstream code (e.g. Linalg and TOSA dialect passes, CSE/Canonicalize/DCE, pattern rewrites).

2 Likes

Here are the revisions:

https://reviews.llvm.org/D144808
https://reviews.llvm.org/D144809
https://reviews.llvm.org/D144810
https://reviews.llvm.org/D144811
https://reviews.llvm.org/D144812
https://reviews.llvm.org/D144813
https://reviews.llvm.org/D144814
https://reviews.llvm.org/D144815
https://reviews.llvm.org/D144816
https://reviews.llvm.org/D144817
https://reviews.llvm.org/D144818

Not much I think?
But my goal is to add some action dispatching “everywhere” and encourage downstream to submit more of these upstream as the needs arises.

1 Like

Concretely, I’m wondering if there will be a way to inject/register an action (e.g. ApplyPatternAction) like how PassInstrumentation is enabled via code like

passManager.addInstrumentation(std::make_unique<PassTracing>());

(IREE source here)

That currently lets downstream code run code before/after each pass, without needing to modify the source code of those passes or of the pass manager.

Really like this proposal. Incorporates a lot of what we wanted the debug action stuff to grow into, so it seems like a fitting evolution. I’m interested in the runtime cost of it being always on though, if this gets placed in some heavy hot loops do we pay for the abstractions? Does everything get nicely inlined in practice? (given that this is a callback based system). That would be a barrier from using it in interesting places, but I would hope that the cost is negligible.

– River

1 Like

You will be able to write instrumentations that operates on any action, that it just like pass instrumentation allows you to run your own instrumentation before/after each pass, you will be able to write an instrumentation that runs before and after a pattern application.

1 Like

Hey @mehdi_amini ,

Great to see flang-new feature in your presentation :slight_smile: !

I was trying to find the “action_debugging.py” script that you use in your presentation, but no luck. Are you planning to upstream it?

Thanks for all this great work!

-Andrzej

1 Like

@mehdi_amini Thank you for adding this functionality.

The items that are In Consideration seem useful. Has there been any further discussion/decision on adding ActionResult?

This is still in the TODO list, while a lot of the action framework has landed, some of the final patch (and the documentation…) are still pending. I am busy with a bunch of other things just now unfortunately, including finishing the implementation of the “properties” RFC.

The framework is now all landed, documentation is here: Action: Tracing and Debugging MLIR-based Compilers - MLIR

The ActionResult is still open right now.

8 Likes