Status: landed, see the documentation is here: Action: Tracing and Debugging MLIR-based Compilers - MLIR
MLIR Actions: Tracing and Debugging MLIR-based Compilers
This work has been cooking for almost a year now, with quite a few people involved along the way, so special shootout to: @ChiaHungDuan, @jpienaar, @inkryp, @Mogball, @tlongeri (I hope I don’t forget anyone).
See the slides and the recording from the MLIR Open Meeting where I demoed this.
Introduction
In MLIR, passes and patterns are the main abstractions to encapsulate general IR transformations. The primary way of observing transformations along the way is to enable “debug printing” of the IR (e.g. -mlir-print-ir-after-all
to print after each pass execution). On top of this, finer grain tracing may be available with -debug
which enables more detailed logs from the transformations themselves. However, this method has some scaling issues: it is limited to a single stream of text that can be gigantic and requires tedious crawling through this log a posteriori. Iterating through multiple runs of collecting such logs and analyzing it can be very time consuming and often not very practical beyond small input programs.
This RFC proposes to build new tracing and debugging capabilities to MLIR on top of what we started with the DebugAction framework. The basic concept of the framework is to allow MLIR developers to define “Actions” to structure their transformations. As defined in the original proposal: A debug action is essentially a marker for a type of action that may be performed within the compiler. There are no constraints on the granularity of an “action”, it could be as simple as “perform this fold” and as complex as “run this pass pipeline”.
The novelty of the proposal here is that we take the concept further to offer a pluggable mechanism to register “Observers” or one “Controller” that can intercept these actions and either inspect, time, and log what is happening (the observers) or make a decision about what to do with the Action (the controller): apply it or skip it.
Some of the tracing already exists in MLIR, but has been built around the Pass infrastructure and so a “Pass” is the only real atom in the system. One way to see this new Action infrastructure may be a generalization of the tracing capabilities we have with passes, applied at a finer grain, and starting to make the infrastructure a bit more orthogonal to the concept of passes.
We implemented some prototype tools to showcase Actions:
- Logging: a simple text log of all the actions executed in order, possibly filtering on the IR location that an action affects.
- Tracing: an observer that can produce a Chrome trace of the execution.
- Interactive debugging: basically this enhance a LLDB/GDB environment to enable users to pause the program before/after the execution of an action, print the status, inspect the IR, etc.
Below we explain how 1) we evolved the DebugAction
API to support new capabilities, 2) implement two new Actions
in key places in MLIR: pass execution and pattern application, and 3) show a bit how the three example use-cases work.
Implementation Notes
DebugActions API Update
The DebugAction is a way for the compiler to interact with a pluggable handler at runtime. A handler can be registered for a customized debug action and use it to instruct the caller to skip the execution of this action. This mechanism is updated to offer more control to the handlers. In particular, the initial design of debug actions only exposes a shouldExecute()
API, which pushes the control to the user, outside of the framework visibility. Instead we are changing this API to inverse the control and put it under the handler responsibility. Right now the code looks like this:
DebugActionManager &manager = context->getDebugActionManager();
// Query the action manager to see if currentPattern should be applied to
// currentOp.
if (manager.shouldExecute<DebugAction>(currentOp) {
// apply a transformation
…
}
In this sequence, the manager isn’t involved in the actual execution of the action and can’t develop rich instrumentations. Instead the API will let the control to the handler itself:
// Execute the action under the control of the manager
manager.dispatch<Action>(currentOp, [&]() {
// apply the transformation in this callback
...
});
This inversion of control (by injecting a callback) allows handlers to implement potentially new interesting features: for example, snapshot the IR before and after the action, or record an action execution time. More importantly, it will allow to capture the nesting execution of actions: the handler will have access to not only information about the current action but also about the enclosing ones, similarly to how gdb can walk a backtrace for example.
An Operation*
is used here to define the IR that is considered by the transformation, but recognizing that an Action can affect more than an operation, we generalize this into a context
for the action, which is an array of IRUnit
which is defined as:
using IRUnit = llvm::PointerUnion<Operation *, Block *, Region *>;
Dispatching an Action
can now be done through the MLIRContext
, for example for :
context->dispatch<MyAction>(
[&]() {
// ...
},
{op, block});
ExecutionContext
The context allows to register a single handler for dispatched action. This handler if present takes control of the action. In the absence of a handler, the callback for the Action is directly invoked.
We implemented a handler intended to help debugging MLIR: the ExecutionContext
. This component is really a pluggable orchestrator to handle Actions. The responsibilities of this component are the following:
- Accept registration of
Observers
and forward them the action before and after its execution. - Accept registration of
BreakpointManager
: when anAction
is dispatched the managers are queries to “match” theAction
for an existingBreakpoint
. If one is found theAction
is then passed to the “controller”, otherwise theAction
is executed. - When the controller returns, the
ExecutionContext
acts on the return value: theAction
is applied, skipped, stepped in or over, or the execution continues until the end of the parent action (think GDB “finish” command). - Actions are chained into a stack to offer a “backtrace” mechanism.
Demo application
Action for encapsulating Pass Execution and Pattern Application
First we need to actually dispatch Actions somewhere. Pass execution and pattern application are the two main encapsulation mechanisms for transformations in MLIR. They are a natural starting point to hook them to the Action
mechanism and unlock a significant number of users immediately without the need to modify any existing code.
Pattern Application
The PatternApplicator.cpp is updated to dispatch pattern application through a debug action under the control of the manager:
/// This is the type of Action that is dispatched when a pattern is applied.
/// It captures the pattern to apply on top of the usual context.
class ApplyPatternAction : public tracing::ActionImpl<ApplyPatternAction> {
public:
using Base = tracing::ActionImpl<ApplyPatternAction>;
ApplyPatternAction(ArrayRef<IRUnit> irUnits, const Pattern &pattern)
: Base(irUnits), pattern(pattern) {}
static constexpr StringLiteral tag = "apply-pattern-action";
static constexpr StringLiteral desc =
"Encapsulate the application of rewrite patterns";
void print(raw_ostream &os) const override {
os << "`" << tag << " pattern: " << pattern.getDebugName();
}
private:
const Pattern &pattern;
};
...
// Try to match and rewrite this pattern. The patterns are sorted by
// benefit, so if we match we can immediately rewrite. For PDL patterns, the
// match has already been performed, we just need to rewrite.
bool matched = false;
op->getContext()->dispatch<ApplyPatternAction>(
[&]() {
rewriter.setInsertionPoint(op);
...
const auto *pattern = static_cast<const RewritePattern *>(bestPattern);
result = pattern->matchAndRewrite(op, rewriter);
},
{op}, *bestPattern);
if (matched)
break;
Pass Execution
In Pass.cpp, the PassManager executes individual passes through dispatching a debug action to the manager:
/// Encapsulate the "action" of executing a single pass, used for the MLIR
/// tracing infrastructure.
struct PassExecutionAction
: public tracing::ActionImpl<PassExecutionAction> {
using Base = tracing::ActionImpl<PassExecutionAction>;
const Pass &pass;
PassExecutionAction(ArrayRef<IRUnit> irUnits, const Pass &pass)
: Base(irUnits), pass(pass) {}
static constexpr StringLiteral tag = "pass-execution-action";
void print(raw_ostream &os) const override {
os << "" << tag << " "
<< " running \"" << pass.getName() << "\" on Operation \"";
ArrayRef<IRUnit> irUnits = getContextIRUnits();
if (irUnits.empty()) {
os << "<missing?>";
} else {
os << irUnits.front().dyn_cast<Operation *>()->getName() << "\"";
}
}
};
...
bool passFailed;
op->getContext()->dispatch<PassExecutionAction>(
[&]() {
// Invoke the virtual runOnOperation method.
if (auto *adaptor = dyn_cast<OpToOpPassAdaptor>(pass))
adaptor->runOnOperation(verifyPasses);
else
pass->runOnOperation();
passFailed = pass->passState->irAndPassFailed.getInt();
},
{op}, *pass);
The same mechanism generalizes and can be used in any kind of transformation, for example a developer writing an “inliner” transformation could implement on the model above a InlinerAction
and wrap each individual inlining transformation in the same dispatch<InlinerAction>(...)
as above.
Logging
We added a --log-actions-to=<path>
to produce a log of the sequence of actions applied by the compiler. Below is an example of this options in action with the flang
compiler:
[thread 0] begins (no breakpoint) Action `pass-execution-action` running "CSE" on Operation "builtin.module" (module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<"dlti.endianness", "little">, #dlti.dl_entry<i64, dense<64> : vector<2xi32>>, #dlti.dl_entry<i128, dense<128> : vector<2xi32>>, #dlti.dl_entry<i1, dense<8> : vector<2xi32>>, #dlti.dl_entry<i8, dense<8> : vector<2xi32>>, #dlti.dl_entry<i16, dense<16> : vector<2xi32>>, #dlti.dl_entry<i32, dense<32> : vector<2xi32>>, #dlti.dl_entry<f16, dense<16> : vector<2xi32>>, #dlti.dl_entry<f64, dense<64> : vector<2xi32>>, #dlti.dl_entry<f128, dense<128> : vector<2xi32>>>, fir.defaultkind = "a1c4d8i4l4r4", fir.kindmap = "", llvm.data_layout = "e-m:o-i64:64-i128:128-n32:64-S128", llvm.target_triple = "arm64-apple-macosx13.0.0"} {/*skip region4*/})
[thread 0] completed `pass-execution-action`
[thread 0] begins (no breakpoint) Action `pass-execution-action` running "mlir::detail::OpToOpPassAdaptor" on Operation "builtin.module" (module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<"dlti.endianness", "little">, #dlti.dl_entry<i64, dense<64> : vector<2xi32>>, #dlti.dl_entry<i128, dense<128> : vector<2xi32>>, #dlti.dl_entry<i1, dense<8> : vector<2xi32>>, #dlti.dl_entry<i8, dense<8> : vector<2xi32>>, #dlti.dl_entry<i16, dense<16> : vector<2xi32>>, #dlti.dl_entry<i32, dense<32> : vector<2xi32>>, #dlti.dl_entry<f16, dense<16> : vector<2xi32>>, #dlti.dl_entry<f64, dense<64> : vector<2xi32>>, #dlti.dl_entry<f128, dense<128> : vector<2xi32>>>, fir.defaultkind = "a1c4d8i4l4r4", fir.kindmap = "", llvm.data_layout = "e-m:o-i64:64-i128:128-n32:64-S128", llvm.target_triple = "arm64-apple-macosx13.0.0"} {/*skip region4*/})
[thread 0] begins (no breakpoint) Action `pass-execution-action` running "ArrayValueCopy" on Operation "func.func" (func.func @_QPtest1a(%arg0: !fir.ref<!fir.array<10xi32>> {fir.bindc_name = "a"}, %arg1: !fir.ref<!fir.array<10xi32>> {fir.bindc_name = "b"}, %arg2: !fir.ref<!fir.array<20xi32>> {fir.bindc_name = "c"}) {/*skip region4*/})
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayFetchConversion (%18 = fir.array_fetch %12, %arg3 : (!fir.array<20xi32>, index) -> i32)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayFetchConversion (%24 = fir.array_fetch %15, %23 : (!fir.array<10xi32>, index) -> i32)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayUpdateConversion (%28 = fir.array_update %arg4, %27, %arg3 : (!fir.array<10xi32>, i32, index) -> !fir.array<10xi32>)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayLoadConversion (%1 = fir.array_load %arg0(%0) : (!fir.ref<!fir.array<10xi32>>, !fir.shape<1>) -> !fir.array<10xi32>)
[thread 0] completed `apply-pattern-action` \
Of course this can generate a lot of data, so a convenient --log-actions-filter
option can be used to filter based on the location of the IR touched by the action. For example filtering on line 184 of the Fortran input file I have:
subroutine test2b(a,b,c,d)
integer :: a(10), b(10), c(10), d(10)
b(c(d)) = a // Line 184
end subroutine test2b
We can get only the actions actually touching this line of source code:
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayFetchConversion (%11 = fir.array_fetch %8, %arg4 : (!fir.array<10xi32>, index) -> i32)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayFetchConversion (%15 = fir.array_fetch %1, %arg4 : (!fir.array<10xi32>, index) -> i32)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayFetchConversion (%21 = fir.array_fetch %4, %20 : (!fir.array<10xi32>, index) -> i32)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayUpdateConversion (%27 = fir.array_update %arg5, %14, %26 : (!fir.array<10xi32>, i32, index) -> !fir.array<10xi32>)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayLoadConversion (%1 = fir.array_load %arg3(%0) : (!fir.ref<!fir.array<10xi32>>, !fir.shape<1>) -> !fir.array<10xi32>)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayLoadConversion (%5 = fir.array_load %arg2(%0) : (!fir.ref<!fir.array<10xi32>>, !fir.shape<1>) -> !fir.array<10xi32>)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayLoadConversion (%9 = fir.array_load %arg1(%0) : (!fir.ref<!fir.array<10xi32>>, !fir.shape<1>) -> !fir.array<10xi32>)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayLoadConversion (%11 = fir.array_load %arg0(%0) : (!fir.ref<!fir.array<10xi32>>, !fir.shape<1>) -> !fir.array<10xi32>)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ArrayMergeStoreConversion (fir.array_merge_store %10, %14 to %arg1 : !fir.array<10xi32>, !fir.array<10xi32>, !fir.ref<!fir.array<10xi32>>)
[thread 0] completed `apply-pattern-action`
[thread 0] begins (no breakpoint) Action `apply-pattern-action pattern: (anonymous namespace)::ConvertConvertOptPattern (%22 = fir.convert %21 : (i32) -> index)
[thread 0] completed `apply-pattern-action`
Tracing
Unsurprisingly, an Action can be traced and this kind of profile generated:
Note there how you can trace in the timeline, pattern after pattern, finding where the time is spent. More importantly clicking on a given pattern will provide interesting details showing at the bottom: the operation the pattern is applied to including the source location. We may know easily from there some information like “which loop in the source code blows the fusion/tiling algorithm”.
Ideally we should make it so that profilers like Tracy could be used out-of-the-box!
Interactive Client
The last application is an interactive client à la gdb. This client supports setting up breakpoints and stepping through the execution of Actions
and skipping them selectively, while inspecting the IR before and after each transformation. Breakpoints are set by using the matching capability on action tag or other properties (for example source location or specific patterns based on their debug name). This section documents the features exposed through the command line interface.
Controlling execution:
- “skip”: continue the execution to the next breakpoint without applying the current transformation.
- “apply”: continue the execution to the next breakpoint after applying the current transformation.
- “step”: continue the execution and stop again for the next event, including if we hit a nested action, or go to the next action if the transformation is already executed.
- “next”: continue the execution and stop again after the current action (not for nested action), or go to the next action if the transformation is already executed.
- “finish”: continue the execution and stop again when the parent action completes.
- “break-on-tag “<tag>”: add a breakpoint matching the provided action by tag.
- “break-on-file “<file:line:loc>”: add a breakpoint matching the provided action.
- “list”: list the active breakpoints and their identifiers.
- “disable [#id]”: disable a breakpoint by its identifiers, or using “all” to disable them all.
- “enable [#id]”: enable a breakpoint by identifiers, or using “all” to enable them all.
- “backtrace”: shows the current stack of actions
Inspecting the IR:
An action comes with a list of IRUnits
(Operation, Block, or Region) as context. A “cursor” in the IR is available and can be controlled by the user using the following set of commands.
- “context” - list the available IRUnits
- “cursor-select-from-context #id” - activate the IRUnits based on the ID
- “cursor-print” - print the current activated IRUnit
- “cursor-parent” - activate the parent IRUnit.
- “cursor-child #id” - activate a child by id.
- “cursor-previous #id” - activate the previous IRUnit in its current list (for example previous Operation in the current block).
- “cursor-next #id” - activate the next IRUnit in its current list (for example next Block in the current region).
In Consideration
We may want to extend the API so that the callback can return information about whether a transformation succeeded or not, and the IRUnit
that was affected: it can be the original pointer if it is still valid, a new one if changed, or nullptr
if deleted.
struct ActionResult {
IRUnit op; // handle to the update IRUnit (can be a new one)
bool changed; // whether the IR was changed.
LogicalResult status; // whether the transforms succeeded or not.
}
// Execute the action under the control of the manager
LogicalResult status = manager.execute<Action>(currentOp, [&]() {
// apply the transformation in this callback
…
return { currentOp, /*changed=*/true, success() };
});
This ActionResult
can be used by the framework to improve logging after an action is completed.
Another thing would be for an Action to also carry an array of StringRef
provided by the client to model optional “instance tags", this array of tags can be filled for example with the pass name or the pattern debug name and debug labels, and used by the client for filtering more generically. The final API for the dispatch
method may look like becomes:
template <typename ActionType, typename... Args>
LogicalResult dispatch(ArrayRef<IRUnit> units, ArrayRef<StringRef> instanceTags,
llvm::function_ref<ActionResult()> transform,
Args &&... args);