Evolving Builder APIs based on lessons learned from EDSC

Authors: @ftynse, @nicolasvasilache
Reading time: ~20 minutes

Ever since the initial introduction, the use of EDSC (Embedded Domain-Specific Constructs) as IR creation API has been contentious in core MLIR. This essay/proposal attempts to reconsider the reasons for creating EDSC and the issues that arise from their existence in order to find a way to reconcile the infrastructural efforts in a common path forward. Note that EDSC precede several major improvements in the MLIR APIs and some of the reasons might have become less important since the time of writing.

Our goal here is to have a constructive discussion on the API design for building the IR and find small incremental improvements. This post is half-analysis half-set-of-proposal, and we would like to land some incremental improvements progressively rather than wait for fully-vetted solution fully specified in text with no associated code.

Arguments Against EDSC

Naming and Conflation

“There are only two hard things in Computer Science: cache invalidation and naming things.”

Arguably, EDSC is a suboptimal name for a piece of the infrastructure that covers two relatively separable concepts: (1) alternative IR construction APIs and (2) elements of embedding IR construction into C++ without necessarily making it domain-specific. The latter can be achieved without tight connection to the former.

Complexity Cost of Extra Abstraction

EDSC introduces a set of additional abstractions (e.g., ValueHandle) beyond core IR concepts that one needs to grasp in order to understand the code using EDSC. While the EDSC API is intended to simplify IR construction, the cost of additional entities that need to be kept in mind may nivelate the simplification.

Having more than one way of doing things also creates the need to make decisions (e.g. whether the benefits of using EDSC abstractions outweigh the costs in the particular scenario), which itself introduces additional cost.

Misleading Names

Same quote about naming things.

EDSC introduces Intrinsics (one of the authors feels quite negatively about this word in MLIR context), functional objects that create IR when called, that use the naming scheme intentionally diverging from the coding convention. When used independently, these objects nicely reflect the structure of the IR to be created. For example

// Produce
// %0 = addf %arg0, %arg1
// %1 = mulf %0, %arg2
// yield %1
auto v0 = addf(arg0, arg1);
auto v1 = mulf(v0, arg2);
yield(v1);

However, when composed with arbitrary C functions, it becomes unclear whether a particular function-like object creates IR or not.

auto v0 = addf(arg0, arg1);
auto v1 = divide(v0, arg2); // Does this create std.divf or does something else?
func(v1);  // Does this create std.func, or is it just a poorly-named debug helper?
yield(v1);

Solving this with different coding styles, e.g. snake_case instead of camelBack, is not always sufficient and may be even considered distracting.

Global State

EDSC uses global state (ScopedContext) to carry builder and location information across function calls without making them appear in the function arguments. While the obvious problem of parallel access is handled through thread_local, the less obvious “action-at-a-distance” problem persists: some functions that use EDSC can only be used if some call above in the call stack created a ScopedContext. It is not immediately visible from the function body whether it is the case, and how far in the call stack, or indeed all possible call stacks leading to this function, the ScopedContext is created.

Scalability and Boilerplate

EDSC requires Intrinsics to be defined for every operation, for example here and here, which does not scale automatically to new ops unlike the templated Builder API. Arguably, one could use ValueBuilder<OpTy> directly, but current uses of EDSC discourage this. Similarly, region-building abstractions are implemented specific to certain Ops, e.g. those in the Loop dialect, and need to be replicated for other operations.

Despite its original goal of reducing Builder API boilerplate, namely explicit locations and repeated accesses to methods on a Builder instance, EDSC also introduces another kind of boilerplate. In the following example extracted from vector to loops conversion, one has to forward-declare an array of edsc::ValueHandle and and an array of pointers thereto, which are passed to the region-building abstraction

auto ivs = ValueHandle::makeIndexHandles(/*num loops*/);
SmallVector<ValueHandle *, 8> pivs =
    makeHandlePointers(MutableArrayRef<ValueHandle>(ivs));
LoopNestBuilder(pivs, /*lower bounds*/, /*upper bounds*/, /*steps*/)([&] {
  // loop body
});

Encouragement of Unspecified Behavior

EDSC makes it easy to write code that exercises a behavior unspecified until C++17 - the order of evaluation of function arguments. For example,

  ValueHandle v1, v2, v3, v4;
  /* initialize v1, v2, v3, v4 somehow */
  add(mul(v1, v2), mul(v3, v4));

may produce one of the two following IR fragments, depending on the order of evaluation:

%0 = muli %v1, %v2
%1 = muli %v3, %v4
%2 = addi %0, %1
// --- or ---
%0 = muli %v3, %v4
%1 = muli %v1, %v2
%2 = addi %1, %0

Note that the final value %2 is identical in both cases, and does not rely on the associativity of addi. However, such variations in the IR are hard to test with the existing FileCheck-based infrastructure. This could become actually problematic if the “nested” instructions, e.g., muli in this case, had side effects, since the order of those side effects could change.

Another remark is that EDSC only makes this problem easy to appear, an equivalent example can be constructed using Builder APIs as follows.

Value v1, v2, v3, v4;
/* initialize v1, v2, v3, v4 somehow */
builder.create<AddIOp>(
    loc, builder.create<MulIOp>(loc, v1, v2), builder.create<MulIOp>(loc, v3, v4));

However, the verbosity of Builder APIs tends to encourage one to introduce named variables for each operation, thus “fixing” the order of evaluation.

Arguments For EDSC

Building Regions using Structured Programming

MLIR is inherently structured due to regions. Builder API was mostly carried over from LLVM, which has an essentially linear IR connected through branches between basic blocks. Constructing LLVM IR using IRBuilder APIs with insertion points looks more “natural” since setting the insertion point is akin to following a branch. This approach becomes significantly more clunky in presence of MLIR regions (the problem is more evident in dialects that use regions extensively: linalg, lhlo, loops.
), requiring one to create structured IR using an equivalent of goto as the only operation. The arguments for using structured programming instead of goto were summarized by Dijkstra in 1979.

EDSC provides an API where the structure of the IR is reflected in the structure of the code constructing it by means of callbacks. Consider creating a simple if/else block.

auto ifOp = builder.create<loop::IfOp>(loc, condition);
{
  OpBuilder::InsertionGuard raii(builder);      // make sure to goto back

  Block *thenBlock = &ifOp.thenRegion().front();
  builder.setInsertionPointToStart(thenBlock);  // goto @then;
  builder.create<ThenBodyOp>(/*...*/);
  // Note that, unlike "thenRegion", "elseRegion" is empty by default, and needs the
  // "yield" terminator to be constructed explicitly.
  Block *elseBlock = builder.createBlock(ifOp.elseRegion());
  builder.setInsertionPointToStart(elseBlock);  // goto @else;
  builder.create<ElseBodyOp>(/*...*/);
  builder.create<loop::YieldOp>(loc, llvm::None);
}


Or with EDSC-style APIs:

loop_if(condition,
  [&]() {
    then_body_op(/*...*/);
    loop_yield();  // We always fill the entire region from scratch.
  },
  [&]() {
    else_body_op(/*...*/);
    loop_yield();
  });

Builder Boilerplate

Builder APIs are unnecessarily verbose in some common cases, in particular in rewrite patterns where the majority of the in-tree uses of Builder APIs are located.

PatternMatchResult matchAndRewrite(Operation *op, PatternRewriter &rewriter) {
  // All new ops will use the location of the original op, yet one has to repeat
  // the location every time.
  Location loc = op->getLocation();

  // Constructing an Op with a simple array attribute containing an integer requires
  // to type "rewriter" four times.
  rewriter.create<OpName>(loc, /*arg*/,
      rewriter.getArrayAttribute(
          rewriter.getIntegerAttr(rewriter.getIntegerType(32), 42))));
}

While location information in MLIR is intentionally visible, there are arguably better ways of ensuring it remains consistent than having to pass it to every single Op constructor; in the end, the API does not prevent one from passing instances of UnknownLoc everywhere. EDSC removes the need to repeat locations, but hides the location information instead. An approach with explicit scope-local location pinning based on RAII may be preferable.

Higher-Level APIs

Complex IR abstractions, such as operations with regions or structured attributes and types, may require additional “helper” abstractions to manipulate them in the C++ code. Such abstractions are already present in the code base, proposed independently for similar readability-improvement reasons, for example MemRefDescriptor that abstracts away the operations emitted to manipulate the lowered-to-LLVM equivalent of a memref using “semantic” names or GPUAllReduceRewriter that avoids repeated mentions of rewriter and location in rewrite patterns. EDSC can provide a common way of defining such helpers that would avoid the proliferation of similar-but-different hand-rolled abstractions.

Higher-level APIs are also aligned with MLIR’s preference towards declarative-style programming, which is manifest, e.g., in TableGen-based declarative rewrite rules. EDSC enables one to write similarly declarative code for rewrite patterns in C++, including for region-carrying ops that are poorly supported in DRR.

More Modern C++ (partly outdated due to core infra improvements)

When introduced originally, EDSC was also insisting on using a “more modern” subset of C++, introducing value-type abstractions (ValueHandle instead of Value * and Type *) and extensively using lambdas and template functions with automatic type deduction. This allowed it to simplify the APIs, e.g., those of functions returning multiple Values through arguments by passing a ValueHandle & instead of uncommon and arguable hard-to-understand Value *&, or those of construction functions by avoiding explicit template arguments.

Core MLIR adopted a similar style since the initial version of EDSC, in particular making Value, Type and Attribute value-types.

Elements of the Solution : Disentangle EDSC Concepts

EDSC was an ambitious forward-looking proposal touching on multiple aspects of MLIR, from core APIs to intended uses as a DSL embedded into C++. At this point, we should treat EDSC as a proper name and do not let its initial acronymic definition impose how it evolves. Separating this proposal into several pieces, each of which can be discussed, improved and maintained separately will go a long way towards a better system. In particular, we can consider separately:

  • APIs for constructing region-carrying operations;
  • Aspects of eager/delayed construction and typing, i.e. ValueHandle representing a Value or a Type;
  • API improvements to reduce repeated builder and location specification;
  • Helper classes to manipulate structured types, e.g. StructuredIndexed, MemRefDescriptor;
  • Facilities for constructing larger pieces of IR, e.g. LoopNestBuilder;
  • Considerations of DSL embedding, e.g. operator overloading to construct IR or leaner function names.

In the following sections, we make several detachable proposals that build on the lessons learned from the EDSC experience and attempt to converge the IR-building APIs.

  • Use function references with existing Builder APIs to simplify the construction of region-carrying operations.
  • With the change above, ValueHandle can be removed.
  • Store OpBuilder and Location objects locally to reduce boilerplate where appropriate.
  • Helper classes for structured types should be rebased on the new unified builder APIs, but their APIs can remain mostly the same.
  • Given the improvements above, helpers for large IR pieces can be simplified significantly.
  • DSL embedding should not be a primary concern for core APIs, which should remain usable following a more imperative style. However, since localized uses of DSL-style APIs tend to appear in the code base due to their simplicity, providing a homogeneous ODS-driven way defining such APIs should be prioritized.

All of the code above has been prototyped: lambdas for regions, builder mixins, nested mixins and lambdas, DSL-style APIs.

Constructing Region-Carrying Operations

Even today, it is possible, albeit cumbersome, to leverage Builder APIs to use lambda functions constructing the body of the region.

  OpBuilder builder(/*...*/);
  builder.create<loop::ForOp>(loc, lb, ub, step, ValueRange(),
    // Lambda populating the body of the loop. Takes the induction variable
    // and the iteration arguments as function arguments. The first argument
    // is a builder with the insertion point set up to the loop body.
    [loc,ub,step](OpBuilder *nested, Value i, ValueRange) {
      // Such constructs are easily nestable.
      nested->create<loop::ForOp>(loc, i, ub, step, ValueRange(),
        [loc,i](OpBuilder *moreNested, Value j, ValueRange) {
          moreNested->create<AddIOp>(loc, i, j);
          moreNested->create<loop::YieldOp>(loc);
        });
      nested->create<loop::YieldOp>(loc);
    });

This approach reflects the nested nature of regions in the constructing code, but (1) requires OpBuilder manipulation in loop::ForOp::build with potentially unsafe casting and (2) creates potentially confusing relations between multiple builders involved. Alternatively to creating new builders, one can set up the insertion point of the “main” builder before and after the lambda and require the caller to capture it, but this may be also confusing due to “action at a distance”.

Note that similar lambda-based helper functions are already used in the code base.

Helpers to build composite IR fragments that are not reflected as IR operations, for example loop nests, can be implemented easily using such APIs.

Getting Rid of ValueHandle

The style of the APIs above ensures, at a cost of some additional code in individual Op::build functions, that Values of block arguments are available before the block itself is constructed. This essentially removes the only remaining reason for edsc::ValueHandle to exist—provide a placeholder for the type of the value before the value is created. Adopting this style of the APIs will likely allow existing EDSC functionality to replace ValueHandle with Value and thus remove an additional abstraction layer and related complexity.

Builder Mixins

Instead of using global state to cache Builder and Location that are reused in a sequence of IR-constructing calls, one could store them in the class performing the construction using a mixin-like mechanism. This prevents one from defining free functions with assumptions on global state. While having class methods with similar assumptions on class state may look similar, it is arguably more common for classes methods to use class fields to communicate information between them and arguably less surprising for the programmer. Furthermore, given that a large amount of IR manipulation happens in pattern classes, these classes can have the corresponding mixins by default.

One can construct, e.g., the following API that relies on RAII to locally “pin” the builder and the location information for all operations.

struct IRBuildingClass : public PinnableOpBuilderMixin {
  IRBuildingClass(MLIRContext *context) : PinnableOpBuilderMixin(context) {}
  void build(OpBuilder &builder, Location loc, Location otherLoc);
};

void IRBuildingClass::build(OpBuilder &builder, Location loc, Location otherLoc) {
  // While this object is live and not overridden by another similar object,
  // calls to create<> will use the given builder and the given location.
  // The values are stored in private variables of (Pinnable)OpBuilderMixin and
  // cannot be overwritten directly.
  auto raii = pinLocationWithBuilder(builder, loc);

  Value zero = create<ConstantIndexOp>(0);
  Value fortytwo = create<ConstantIndexOp>(42);
  {
    // It is possible to use multiple locations with local RAII objects.
    auto raii = pinLocation(otherLoc);
    Value one = create<ConstantIndexOp>(1);
  }

  create<loop::ForOp>(zero, fortytwo, one, ValueRange(),
                      [this](OpBuilder *nested, Value iter, ValueRange) {
                        auto raii = withBuilder(*nested);
                        create<AddIOp>(iter, iter);
                        create<loop::YieldOp>();
                      });
  // It remains possible to use the conventional Builder APIs.
  builder->create<ReturnOp>(loc);
}

Small modifications to the core builder and operation APIs allow us to elide RAII objects inside lambdas, making sure the builder insertion point is updated immediately before and after calling the function. This enables the following usage.

void OtherIRBuildingClass::build(OpBuilder &builder, Location loc) {
  auto raii = pinLocationWithBuilder(builder, loc);

  Value zero = create<ConstantIndexOp>(0);
  Value fortytwo = create<ConstantIndexOp>(42);
  Value one = create<ConstantIndexOp>(1);
  create<loop::ForOp>(zero, fortytwo, one, ValueRange(),
                      region([this](ValueRange args) {
                        create<AddIOp>(args[0], args[0]);
                        create<loop::YieldOp>();
                      }));
  create<ReturnOp>();
}

DSL-style Helper Classes

Instead of providing EDSC Intrinsics as free IR-building functions that access global state, one can implement an equivalent functionality with dialect-scoped helper classes that may optionally store the builder locally. Note that this is orthogonal to the Builder Mixins idea, but can be combined with it.

void build(OpBuilder &builder, Location loc) {
  StdDialectBuilder std(builder, loc);
  LoopDialectBuilder loop(builder, loc);

  Value zero = std.ConstantIndex(0);
  Value one = std.ConstantIndex(1);
  Value fourtytwo = std.ConstantIndex(42);
  loop::ForOp f = loop.For(zero, fourtytwo, one, ValueRange());

  {
    OpBuilder::InsertionGuard raii(builder);
    builder.setInsertionPointToStart(f.body());
    std.AddI(f.getInductionVar(), f.getInduciton.Var());
    loop.Yield();
  }
}

While the function names do not contain “create”, their intention may be inferred from the containing class, which is clearly named as builder. Furthermore, a naming convention may require to call *_create the variables of *DialectBuilder, which would lead to more visible (yet less DSL-like) create_std.ConstantIndex.

Such helper classes can be generated automatically from ODS. As an additional bonus, it makes IR-Building APIs to the existing C++ tooling, which is often struggling to look through the variadic templates of OpBuilder::create and show the expected argument lists.

Combined with Builder Mixins, one can reach the same style of DSL-style APIs as current EDSC, or any intermediate form between current Builder APIs and current EDSC, all without resorting to global state.

    auto zero = std_constant_index(0);
    auto fourtytwo = std_constant_index(42);
    auto one = std_constant_index(1);
    loop_for(zero, fourtytwo, one, {}, region([this](ValueRange args) {
               std_addi(args[0], args[0]);
               loop_yield();
             }));
    std_return();

Appendix: The Case For Modern MetaProgramming in MLIR

Modern metaprogramming support has not been a historical priority in MLIR.

MLIR design itself touches on multiple things, DSL philosophy, IR design and LLVM era “metaprogramming tools”. For instance:

  1. Tablegen can be viewed as a powerful DSL based on type-unsafe string stitching.
  2. C++ template metaprogramming is typed but has general usability, extensibility and boilerplate issues.
  3. LLVM Builders are a proven but somewhat dated API and incite users to introduce helper functions that end up overlapping in an unprincipled fashion.
  4. C macros is not a real contender.

Another way to metaprogram is to add a few key wrapping abstractions. One could define metaprogramming as programming at a distance (We famously know that “All problems in computer science can be solved by another level of indirection”). EDSC is a particular implementation of “programming at a distance”. It introduces layers of indirection between the user and the raw builders API. This layer of indirection can be used to both build IR in more a composable way, as well as to pattern match it.

Another approach is to directly start with a language that has builtin support for pattern matching and/or partial evaluation (e.g. OCaml, Haskell, possibly Swift).

The literature in the metaprogramming space is rich, in particular Tiark Rompf’s DeLite and Zak Devito’s Terra are good sources of inspiration.

2 Likes

Wow, this API is really cool. Are there any downsides?

Thank you both for writing this up. I agree that EDSC should be reconsidered and detangled - it was a bit ahead of its time in that the basics of MLIR core were not really baked when it came on the scene.

I’d love to see a process of slimming down EDSC (e.g. eliminating ValueHandle) progressively in place, and building up the core API. I don’t think we have to make a determination of the ultimate end state today, but I think you’ve got a great vector we should nudge towards.

One comment about this post is that it is very long and includes a bunch of related ideas. It might be worthwhile to split this out the context to a background section in the existing EDSC doc. I’d also love to see more caveats added to the top of that doc stating that EDSCs are a work in progress and subject to significant design changes.

With that done, we can then consider each of your proposals individually - I think you’ve got a lot of great ideas here, but am afraid the discussion will get confused by lots of them being discussed at the same time in one big mega thread.

What do you think?

-Chris

1 Like

It seems to me like the fundamental goal behind EDSC is to make the code for creating a syntax tree to look like the serialized syntax tree itself. I wonder if you’ve thought about taking this to the extreme and making the syntaxes exactly the same (without the chicanery of embedding it in C++ constructs at all). Maybe something like:

auto fragment = attach(v1,v2,v3,v4).quote("
%0 = muli %v1, %v2 %1 = muli %v3, %v4 %2 = addi %0, %1
“)
some_use(fragment.get(”%2));

We considered such extreme, but the unclear part is how to get this parsed at build time. Short of writing a clang-plugin it isn’t clear to me how we can get there.

On the high level, I see two:

  • It may not play well with extra virtual functions defined in PatternRewriter compared to OpBuilder. I haven’t tried this at more than a couple cases.
  • Location pinning may not necessarily be desirable. We can actually have two versions for all builder functions: with explicit location and without it.

On a slightly lower level:

  • We need to define the *DialectBuilder classes; it looks possible from ODS definitions but with even more string shuffling and stitching than we have today; it will need more boilerplate for Ops defined directly in C++. Nothing unsolvable though.
  • A risk of having different OpBuilder instances associated to different *DialectBuilder if we use more than one may lead to surprising behavior. Same for state connection between the OpBuilder instance and *DialectBuilder instances, for example if one creates a block using builder, it automatically updates the insertion point for everything. I don’t see a technical solution here, only enforcing certain aspects at review.

I was thinking of updating that doc as we go, to reflect the relevant changes. Some pieces of the historical context (e.g., we wanted value-typed Value) are already irrelevant. I can put the “arguments for EDSC” there. My ultimate goal though is to not have those arguments at all because we only have one way of building IR :slight_smile:

Yeah, I wanted to give folks the bigger picture, some time to think about this and raise any concerns if they don’t agree with the overall direction. I am also looking for two kinds of feedback: have we overlooked something and is there something folks want to be done sooner (many ideas are independent but may trigger a long discussion). I actually don’t expect a technical discussion in this thread, it’s too long and intimidating for that.

It was one of the goals. I would claim that the overall goal was to make it fast to construct MLIR by hand from C++ and make that construction relatively readable. Something like the vector-transfer lowering does not even remotely look like the IR it produces. That’s why I think EDSC needs quite a bit of detangling.

Have an mlir-translate mode that emits C++ code building the IR it consumed? In any case, we would also need to be able to parse partially-defined IR, e.g. with missing references.

This can be also done at runtime, but that sounds expensive and error-prone.

+1 and I should probably delay responding to avoid starting.convos that should be on the split ones … But I’ll just try to be brief instead :slightly_smiling_face:

I think the point was that the MLIR snippet starts inside a C++ program and so we’d need to extract it before the C++ program compilation.

Quasiquoting’esque form :slight_smile: (I would say that C++ doesn’t have good support for it and then someone will come up with a constexpr parser as a counter of it being feasible …). So invoking the parser on it to construct the IR and using bindings to specify inputs and bind outputs? (E.g., attach’s first arg becomes arg0 in the IR snippet, or perhaps all snippets have to start with block, and explicit return that populates a vector). So making it a runtimes construct?

If we called builder b then b.create<AddIOp>(iter, iter) would work today, but to get to std.addiop(iter, iter) the insertion point would need to be in scope. Would the idea be that we have a vector of insertion points (to enter/exit scopes) and the top insertion point is used? So 10 characters saved but we need to add a std builder? [And I could have a macro B(AddIOp)(iter, iter) which might be terrible but also concise and doesn’t require much :yum:]

I think the helpers for loops/regions are clear wins and not too invasive (the lambda could always take a builder as first input and then the loop construct provides the correct one for the region).

Following up on this: thanks a lot for taking the time to write this summary of the situation with respect to EDSCs, and I’m super impressed by the work you put into all the experiments with the alternatives. I am not sure I totally followed how the appendix fits into this work at the moment, but otherwise I’m very supportive of the general direction here!

I’d like to echo this. Seeing some of the design alternatives explored in some depth helped me to better understand the merits and tradeoffs in the current design. Thanks!