Authors: @ftynse, @nicolasvasilache
Reading time: ~20 minutes
Ever since the initial introduction, the use of EDSC (Embedded Domain-Specific Constructs) as IR creation API has been contentious in core MLIR. This essay/proposal attempts to reconsider the reasons for creating EDSC and the issues that arise from their existence in order to find a way to reconcile the infrastructural efforts in a common path forward. Note that EDSC precede several major improvements in the MLIR APIs and some of the reasons might have become less important since the time of writing.
Our goal here is to have a constructive discussion on the API design for building the IR and find small incremental improvements. This post is half-analysis half-set-of-proposal, and we would like to land some incremental improvements progressively rather than wait for fully-vetted solution fully specified in text with no associated code.
Arguments Against EDSC
Naming and Conflation
“There are only two hard things in Computer Science: cache invalidation and naming things.”
Arguably, EDSC is a suboptimal name for a piece of the infrastructure that covers two relatively separable concepts: (1) alternative IR construction APIs and (2) elements of embedding IR construction into C++ without necessarily making it domain-specific. The latter can be achieved without tight connection to the former.
Complexity Cost of Extra Abstraction
EDSC introduces a set of additional abstractions (e.g., ValueHandle) beyond core IR concepts that one needs to grasp in order to understand the code using EDSC. While the EDSC API is intended to simplify IR construction, the cost of additional entities that need to be kept in mind may nivelate the simplification.
Having more than one way of doing things also creates the need to make decisions (e.g. whether the benefits of using EDSC abstractions outweigh the costs in the particular scenario), which itself introduces additional cost.
Misleading Names
Same quote about naming things.
EDSC introduces Intrinsics (one of the authors feels quite negatively about this word in MLIR context), functional objects that create IR when called, that use the naming scheme intentionally diverging from the coding convention. When used independently, these objects nicely reflect the structure of the IR to be created. For example
// Produce
// %0 = addf %arg0, %arg1
// %1 = mulf %0, %arg2
// yield %1
auto v0 = addf(arg0, arg1);
auto v1 = mulf(v0, arg2);
yield(v1);
However, when composed with arbitrary C functions, it becomes unclear whether a particular function-like object creates IR or not.
auto v0 = addf(arg0, arg1);
auto v1 = divide(v0, arg2); // Does this create std.divf or does something else?
func(v1); // Does this create std.func, or is it just a poorly-named debug helper?
yield(v1);
Solving this with different coding styles, e.g. snake_case
instead of camelBack
, is not always sufficient and may be even considered distracting.
Global State
EDSC uses global state (ScopedContext) to carry builder and location information across function calls without making them appear in the function arguments. While the obvious problem of parallel access is handled through thread_local
, the less obvious “action-at-a-distance” problem persists: some functions that use EDSC can only be used if some call above in the call stack created a ScopedContext
. It is not immediately visible from the function body whether it is the case, and how far in the call stack, or indeed all possible call stacks leading to this function, the ScopedContext
is created.
Scalability and Boilerplate
EDSC requires Intrinsics to be defined for every operation, for example here and here, which does not scale automatically to new ops unlike the templated Builder API. Arguably, one could use ValueBuilder<OpTy>
directly, but current uses of EDSC discourage this. Similarly, region-building abstractions are implemented specific to certain Ops, e.g. those in the Loop dialect, and need to be replicated for other operations.
Despite its original goal of reducing Builder API boilerplate, namely explicit locations and repeated accesses to methods on a Builder
instance, EDSC also introduces another kind of boilerplate. In the following example extracted from vector to loops conversion, one has to forward-declare an array of edsc::ValueHandle
and and an array of pointers thereto, which are passed to the region-building abstraction
auto ivs = ValueHandle::makeIndexHandles(/*num loops*/);
SmallVector<ValueHandle *, 8> pivs =
makeHandlePointers(MutableArrayRef<ValueHandle>(ivs));
LoopNestBuilder(pivs, /*lower bounds*/, /*upper bounds*/, /*steps*/)([&] {
// loop body
});
Encouragement of Unspecified Behavior
EDSC makes it easy to write code that exercises a behavior unspecified until C++17 - the order of evaluation of function arguments. For example,
ValueHandle v1, v2, v3, v4;
/* initialize v1, v2, v3, v4 somehow */
add(mul(v1, v2), mul(v3, v4));
may produce one of the two following IR fragments, depending on the order of evaluation:
%0 = muli %v1, %v2
%1 = muli %v3, %v4
%2 = addi %0, %1
// --- or ---
%0 = muli %v3, %v4
%1 = muli %v1, %v2
%2 = addi %1, %0
Note that the final value %2
is identical in both cases, and does not rely on the associativity of addi
. However, such variations in the IR are hard to test with the existing FileCheck-based infrastructure. This could become actually problematic if the “nested” instructions, e.g., muli
in this case, had side effects, since the order of those side effects could change.
Another remark is that EDSC only makes this problem easy to appear, an equivalent example can be constructed using Builder
APIs as follows.
Value v1, v2, v3, v4;
/* initialize v1, v2, v3, v4 somehow */
builder.create<AddIOp>(
loc, builder.create<MulIOp>(loc, v1, v2), builder.create<MulIOp>(loc, v3, v4));
However, the verbosity of Builder
APIs tends to encourage one to introduce named variables for each operation, thus “fixing” the order of evaluation.
Arguments For EDSC
Building Regions using Structured Programming
MLIR is inherently structured due to regions. Builder API was mostly carried over from LLVM, which has an essentially linear IR connected through branches between basic blocks. Constructing LLVM IR using IRBuilder
APIs with insertion points looks more “natural” since setting the insertion point is akin to following a branch. This approach becomes significantly more clunky in presence of MLIR regions (the problem is more evident in dialects that use regions extensively: linalg, lhlo, loops.
), requiring one to create structured IR using an equivalent of goto
as the only operation. The arguments for using structured programming instead of goto
were summarized by Dijkstra in 1979.
EDSC provides an API where the structure of the IR is reflected in the structure of the code constructing it by means of callbacks. Consider creating a simple if/else block.
auto ifOp = builder.create<loop::IfOp>(loc, condition);
{
OpBuilder::InsertionGuard raii(builder); // make sure to goto back
Block *thenBlock = &ifOp.thenRegion().front();
builder.setInsertionPointToStart(thenBlock); // goto @then;
builder.create<ThenBodyOp>(/*...*/);
// Note that, unlike "thenRegion", "elseRegion" is empty by default, and needs the
// "yield" terminator to be constructed explicitly.
Block *elseBlock = builder.createBlock(ifOp.elseRegion());
builder.setInsertionPointToStart(elseBlock); // goto @else;
builder.create<ElseBodyOp>(/*...*/);
builder.create<loop::YieldOp>(loc, llvm::None);
}
Or with EDSC-style APIs:
loop_if(condition,
[&]() {
then_body_op(/*...*/);
loop_yield(); // We always fill the entire region from scratch.
},
[&]() {
else_body_op(/*...*/);
loop_yield();
});
Builder Boilerplate
Builder APIs are unnecessarily verbose in some common cases, in particular in rewrite patterns where the majority of the in-tree uses of Builder APIs are located.
PatternMatchResult matchAndRewrite(Operation *op, PatternRewriter &rewriter) {
// All new ops will use the location of the original op, yet one has to repeat
// the location every time.
Location loc = op->getLocation();
// Constructing an Op with a simple array attribute containing an integer requires
// to type "rewriter" four times.
rewriter.create<OpName>(loc, /*arg*/,
rewriter.getArrayAttribute(
rewriter.getIntegerAttr(rewriter.getIntegerType(32), 42))));
}
While location information in MLIR is intentionally visible, there are arguably better ways of ensuring it remains consistent than having to pass it to every single Op constructor; in the end, the API does not prevent one from passing instances of UnknownLoc
everywhere. EDSC removes the need to repeat locations, but hides the location information instead. An approach with explicit scope-local location pinning based on RAII may be preferable.
Higher-Level APIs
Complex IR abstractions, such as operations with regions or structured attributes and types, may require additional “helper” abstractions to manipulate them in the C++ code. Such abstractions are already present in the code base, proposed independently for similar readability-improvement reasons, for example MemRefDescriptor that abstracts away the operations emitted to manipulate the lowered-to-LLVM equivalent of a memref
using “semantic” names or GPUAllReduceRewriter that avoids repeated mentions of rewriter
and location
in rewrite patterns. EDSC can provide a common way of defining such helpers that would avoid the proliferation of similar-but-different hand-rolled abstractions.
Higher-level APIs are also aligned with MLIR’s preference towards declarative-style programming, which is manifest, e.g., in TableGen-based declarative rewrite rules. EDSC enables one to write similarly declarative code for rewrite patterns in C++, including for region-carrying ops that are poorly supported in DRR.
More Modern C++ (partly outdated due to core infra improvements)
When introduced originally, EDSC was also insisting on using a “more modern” subset of C++, introducing value-type abstractions (ValueHandle
instead of Value *
and Type *
) and extensively using lambdas and template functions with automatic type deduction. This allowed it to simplify the APIs, e.g., those of functions returning multiple Value
s through arguments by passing a ValueHandle &
instead of uncommon and arguable hard-to-understand Value *&
, or those of construction functions by avoiding explicit template arguments.
Core MLIR adopted a similar style since the initial version of EDSC, in particular making Value
, Type
and Attribute
value-types.
Elements of the Solution : Disentangle EDSC Concepts
EDSC was an ambitious forward-looking proposal touching on multiple aspects of MLIR, from core APIs to intended uses as a DSL embedded into C++. At this point, we should treat EDSC as a proper name and do not let its initial acronymic definition impose how it evolves. Separating this proposal into several pieces, each of which can be discussed, improved and maintained separately will go a long way towards a better system. In particular, we can consider separately:
- APIs for constructing region-carrying operations;
- Aspects of eager/delayed construction and typing, i.e. ValueHandle representing a
Value
or aType
; - API improvements to reduce repeated builder and location specification;
- Helper classes to manipulate structured types, e.g. StructuredIndexed, MemRefDescriptor;
- Facilities for constructing larger pieces of IR, e.g. LoopNestBuilder;
- Considerations of DSL embedding, e.g. operator overloading to construct IR or leaner function names.
In the following sections, we make several detachable proposals that build on the lessons learned from the EDSC experience and attempt to converge the IR-building APIs.
- Use function references with existing
Builder
APIs to simplify the construction of region-carrying operations. - With the change above,
ValueHandle
can be removed. - Store
OpBuilder
andLocation
objects locally to reduce boilerplate where appropriate. - Helper classes for structured types should be rebased on the new unified builder APIs, but their APIs can remain mostly the same.
- Given the improvements above, helpers for large IR pieces can be simplified significantly.
- DSL embedding should not be a primary concern for core APIs, which should remain usable following a more imperative style. However, since localized uses of DSL-style APIs tend to appear in the code base due to their simplicity, providing a homogeneous ODS-driven way defining such APIs should be prioritized.
All of the code above has been prototyped: lambdas for regions, builder mixins, nested mixins and lambdas, DSL-style APIs.
Constructing Region-Carrying Operations
Even today, it is possible, albeit cumbersome, to leverage Builder
APIs to use lambda functions constructing the body of the region.
OpBuilder builder(/*...*/);
builder.create<loop::ForOp>(loc, lb, ub, step, ValueRange(),
// Lambda populating the body of the loop. Takes the induction variable
// and the iteration arguments as function arguments. The first argument
// is a builder with the insertion point set up to the loop body.
[loc,ub,step](OpBuilder *nested, Value i, ValueRange) {
// Such constructs are easily nestable.
nested->create<loop::ForOp>(loc, i, ub, step, ValueRange(),
[loc,i](OpBuilder *moreNested, Value j, ValueRange) {
moreNested->create<AddIOp>(loc, i, j);
moreNested->create<loop::YieldOp>(loc);
});
nested->create<loop::YieldOp>(loc);
});
This approach reflects the nested nature of regions in the constructing code, but (1) requires OpBuilder
manipulation in loop::ForOp::build
with potentially unsafe casting and (2) creates potentially confusing relations between multiple builders involved. Alternatively to creating new builders, one can set up the insertion point of the “main” builder before and after the lambda and require the caller to capture it, but this may be also confusing due to “action at a distance”.
Note that similar lambda-based helper functions are already used in the code base.
Helpers to build composite IR fragments that are not reflected as IR operations, for example loop nests, can be implemented easily using such APIs.
Getting Rid of ValueHandle
The style of the APIs above ensures, at a cost of some additional code in individual Op::build
functions, that Value
s of block arguments are available before the block itself is constructed. This essentially removes the only remaining reason for edsc::ValueHandle
to exist—provide a placeholder for the type of the value before the value is created. Adopting this style of the APIs will likely allow existing EDSC functionality to replace ValueHandle
with Value
and thus remove an additional abstraction layer and related complexity.
Builder Mixins
Instead of using global state to cache Builder
and Location
that are reused in a sequence of IR-constructing calls, one could store them in the class performing the construction using a mixin-like mechanism. This prevents one from defining free functions with assumptions on global state. While having class methods with similar assumptions on class state may look similar, it is arguably more common for classes methods to use class fields to communicate information between them and arguably less surprising for the programmer. Furthermore, given that a large amount of IR manipulation happens in pattern classes, these classes can have the corresponding mixins by default.
One can construct, e.g., the following API that relies on RAII to locally “pin” the builder and the location information for all operations.
struct IRBuildingClass : public PinnableOpBuilderMixin {
IRBuildingClass(MLIRContext *context) : PinnableOpBuilderMixin(context) {}
void build(OpBuilder &builder, Location loc, Location otherLoc);
};
void IRBuildingClass::build(OpBuilder &builder, Location loc, Location otherLoc) {
// While this object is live and not overridden by another similar object,
// calls to create<> will use the given builder and the given location.
// The values are stored in private variables of (Pinnable)OpBuilderMixin and
// cannot be overwritten directly.
auto raii = pinLocationWithBuilder(builder, loc);
Value zero = create<ConstantIndexOp>(0);
Value fortytwo = create<ConstantIndexOp>(42);
{
// It is possible to use multiple locations with local RAII objects.
auto raii = pinLocation(otherLoc);
Value one = create<ConstantIndexOp>(1);
}
create<loop::ForOp>(zero, fortytwo, one, ValueRange(),
[this](OpBuilder *nested, Value iter, ValueRange) {
auto raii = withBuilder(*nested);
create<AddIOp>(iter, iter);
create<loop::YieldOp>();
});
// It remains possible to use the conventional Builder APIs.
builder->create<ReturnOp>(loc);
}
Small modifications to the core builder and operation APIs allow us to elide RAII objects inside lambdas, making sure the builder insertion point is updated immediately before and after calling the function. This enables the following usage.
void OtherIRBuildingClass::build(OpBuilder &builder, Location loc) {
auto raii = pinLocationWithBuilder(builder, loc);
Value zero = create<ConstantIndexOp>(0);
Value fortytwo = create<ConstantIndexOp>(42);
Value one = create<ConstantIndexOp>(1);
create<loop::ForOp>(zero, fortytwo, one, ValueRange(),
region([this](ValueRange args) {
create<AddIOp>(args[0], args[0]);
create<loop::YieldOp>();
}));
create<ReturnOp>();
}
DSL-style Helper Classes
Instead of providing EDSC Intrinsics as free IR-building functions that access global state, one can implement an equivalent functionality with dialect-scoped helper classes that may optionally store the builder locally. Note that this is orthogonal to the Builder Mixins idea, but can be combined with it.
void build(OpBuilder &builder, Location loc) {
StdDialectBuilder std(builder, loc);
LoopDialectBuilder loop(builder, loc);
Value zero = std.ConstantIndex(0);
Value one = std.ConstantIndex(1);
Value fourtytwo = std.ConstantIndex(42);
loop::ForOp f = loop.For(zero, fourtytwo, one, ValueRange());
{
OpBuilder::InsertionGuard raii(builder);
builder.setInsertionPointToStart(f.body());
std.AddI(f.getInductionVar(), f.getInduciton.Var());
loop.Yield();
}
}
While the function names do not contain “create”, their intention may be inferred from the containing class, which is clearly named as builder. Furthermore, a naming convention may require to call *_create
the variables of *DialectBuilder
, which would lead to more visible (yet less DSL-like) create_std.ConstantIndex
.
Such helper classes can be generated automatically from ODS. As an additional bonus, it makes IR-Building APIs to the existing C++ tooling, which is often struggling to look through the variadic templates of OpBuilder::create
and show the expected argument lists.
Combined with Builder Mixins, one can reach the same style of DSL-style APIs as current EDSC, or any intermediate form between current Builder
APIs and current EDSC, all without resorting to global state.
auto zero = std_constant_index(0);
auto fourtytwo = std_constant_index(42);
auto one = std_constant_index(1);
loop_for(zero, fourtytwo, one, {}, region([this](ValueRange args) {
std_addi(args[0], args[0]);
loop_yield();
}));
std_return();
Appendix: The Case For Modern MetaProgramming in MLIR
Modern metaprogramming support has not been a historical priority in MLIR.
MLIR design itself touches on multiple things, DSL philosophy, IR design and LLVM era “metaprogramming tools”. For instance:
- Tablegen can be viewed as a powerful DSL based on type-unsafe string stitching.
- C++ template metaprogramming is typed but has general usability, extensibility and boilerplate issues.
- LLVM Builders are a proven but somewhat dated API and incite users to introduce helper functions that end up overlapping in an unprincipled fashion.
- C macros is not a real contender.
Another way to metaprogram is to add a few key wrapping abstractions. One could define metaprogramming as programming at a distance (We famously know that “All problems in computer science can be solved by another level of indirection”). EDSC is a particular implementation of “programming at a distance”. It introduces layers of indirection between the user and the raw builders API. This layer of indirection can be used to both build IR in more a composable way, as well as to pattern match it.
Another approach is to directly start with a language that has builtin support for pattern matching and/or partial evaluation (e.g. OCaml, Haskell, possibly Swift).
The literature in the metaprogramming space is rich, in particular Tiark Rompf’s DeLite and Zak Devito’s Terra are good sources of inspiration.