[RFC] DSL-style / dialect-specific builders

I’m not referring to internals or anything specific to the dialect, but plainly to IR building - something that is common ground for all MLIR developers that they shouldn’t have to dig deeper to understand. The code snippet I showed from Linalg Transforms (again reproduced below) is creating load/store ops as you know — this isn’t anything dialect specific, and yet it’s unreadable to me. (The IR creation code isn’t clearly spelt out due to the metaprogramming and lost/mixed among regular code.) IR constructing methods/code in other dialects upstream aren’t like this.

IndexedValueType output(op.output());
IndexedValueType input(op.input());
Value lhs = output(indices.outputs);
Value rhs = input(indices.inputs);
...
output(indices.outputs) = maxValue;

To provide some more concrete examples, here how the first pattern Sean’s patch above would look like with different APIs:

1 Like

Thanks Alex for taking the time to put these examples together!

Speaking personally, the “current proposal w/o create” reads best to me. I like that it is succinct and reads very similarly to the corresponding textual .mlir (there’s something about the . vs edsc’s _ after the dialect name which makes it much easier to read). The create is just line noise once we get to this level of fluency.

For the “even more DSL” version, it is too difficult to see what concrete ops are being created.

I would really like to have a “current proposal w/o create”-like API available for these sorts of “expansion” patterns.

1 Like

These are quite useful. Did you paste the wrong link for the first one? There is no create there - the builder is just named create. I assume you meant to have it use createCmpI etc.

If these are the current proposals, I’m not sure why the proposal says standard builders are going to stay. With either of the first two options listed by @ftynse, one can just replace the standard builders. I assume passing an explicit location will also be supported if needed. I felt that was the confusing part of the proposal - the first post neither says standard builders would be replaced nor deprecated and replaced over time.

I really don’t understand what you mean here. MLIR already has OpBuilder - we are not talking about removing the presence of builder APIs! I think you’re arguing that the existing builders are not good enough - I’m arguing that we should either improve them or delete them. There should be one builder abstraction that is “good”.

FWIW, I don’t agree that the create-less APIs are “good”. Beyond the problems with naming conventions, I am very concerned about their reliance on tblgen code generation is also suboptimal – because it means that anything built on top of the builders need to be similarly code generated (they can’t just use C++ templates). This is one of the reasons that I can’t see this ever replacing the existing builders. Given that, this direction cannot lead to unification behind a single “good” API.

Let me give you one example of what I mean. With the existing builder APIs, a downstream project could say “hey, I want a builder that does extra stuff when I build operations” (e.g. maybe make the location implicit). RIght now, you can do this easily:

struct ImplicitLocOpBuilder {
  Location loc;
  OpBuilder underlying;
...

  template<typename T>
  T create(....) { return underlying.create<T>(loc, ...));
}
}

You can’t do this sort of thing when you rely on tblgen code generation for the builder APIs - you’d have to tblgen codegen ImplicitLocOpBuilder, which would require adoption by every dialect that wants to participate.

This breaks scalability of the ecosystem, which is a critical issue to me. I can see how this approach can be used for a couple of dialects, but unless it can be “the” thing, I have a hard time getting excited about forking and fragmenting the ecosystem.

Furthermore, EDSC’s have not been widely adopted outside of the code that ntv’s code, so I don’t see strong empirical evidence that there is a “pull” for this outside a couple of specific (and very amazing) engineers that prefer it.

-Chris

Brainstorming here: What if builder.setInsertionPoint*() took a code point and a Loc. Then each create<> call wouldn’t have to specify the loc explicitly? It might be more awkward for pieces of code that are operating on large numbers of OPs, but we could also keep the existing create<> calls that take a loc for those cases?

No, I did exactly what I intended. I do not propose the createCmpI version, Chris does and then explains why that version isn’t a significant improvement over the current state.

Because the current proposal would be using builders internally. I have not considered fully replacing them, yet, as the proposal is large and contentious enough already. For example, it is not fully clear to me what will happen with various type- and attribute-constructing functions currently in Builder (although this creates a disparity in the API between “privileged” types that have Builder API and dialect-specific types that do not), with PatternRewriter and related functionality, and with the actual ::build functions that currently take an OpBuilder.

That being said, if not removing builders is the main blocker for going through with this proposal, let’s discuss that as well.

MLIR also has the possibility of populating an OperationState and calling Operation::create to construct operations. It was not removed despite having builders. I don’t see why the same should not be allowed for one more API abstraction level.

I agree this is a concern. However, we already generate a lot of APIs: interfaces, passes, accessors to Op operands/attributes. Realistically, how many of such builder extensions can one come up with? Caching locations is one good example, and also one of the motivations for EDSC :slight_smile: If there another where one wants to modify the op creation behavior? If one only needs to be notified that the op is about to be create or has been created, this is already possible through the listener, no need to wrap the builder in such cases.

I’ve seen folks attempting to use it in HLO and then getting asked to remove it during code review. So the visible lack of adoption is, at least partly, due to the pushback against adoption. Note that I don’t propose that everyone just adopts EDSC, and that I listed several examples above where similar API patterns were added locally.

Assuming we don’t get hit by overload resolution+templates, this would still be subject to the lack of autocomplete/documentation and lack of implicit conversions for a small decrease in verbosity.

Let me flip this. What would be the alternative that meets the following requirements:

  • minimal redundancy in subsequent calls (e.g., no repeated location, ideally no repeated meaningless prefix);
  • supports autocomplete/documentation;
  • fully replaces or extends builders?

I see what you’re trying to show, but these serve two different purposes. The OperationState is the underlying implementation details that power the infrastructure in a “type erased” way. The OpBuilder::create<> template is the typed layer that is intended to be used by users of the MLIR APIs.

The EDSL stuff is another “typed layer that is intended to be used by users of the MLIR APIs”. That is why it is redundant, and only one should stand. If the ultimate approach is to use the EDSL inspired API, then it should be built on top of the OperationState API directly, there isn’t a clear reason it should go through another layer of indirection.

I don’t understand what you’re saying here. Surely you agree that there is a difference in significance between ops and interfaces/passes/accessors. Ops are the primal thing in MLIR and exposed by everything, building generic code over families of them is extremely common. We should be doing more here to make this easier, not less.

I think you’re arguing here for a monolithic builder design that ties all concerns into one API. I don’t think this is a good or scalable approach - we should go for simple things that can be layered/composed on top of each other.

Fair point, but I think we should resolve that by deciding whether it is the right thing to do and apply it uniformly. The signal you’re seeing above is just an indicator that people are not happy with EDSCs. If that is the case, then the right answer is to just replace all uses of them with OpBuilder in tree and remove EDSCs (or sink them into other MLIR derived projects that want them). I would also be very +1 on this, it combines very nicely with the orthogonal goal of “improving OpBuilder”.

I don’t understand this “flip” at all. You’re getting pushback on this thread about various design points you’re proposing to achieve those three goals, so it isn’t clear that we “have to” solve those problems.

I love that you’re pushing the envelope here to try to resolve the fissure between OpBuilder and EDSCs, but I am fairly serious that we should consider the alternate path: just remove EDSCs (replacing with direct use of OpBuilder), and then continue a discussion of improving OpBuilder. The fork in the universe is really a separable problem from the “how do we get the best builder API possible”.

I can’t see any strong argument that we should remove OpBuilder and keep EDSCs given the compile time and many other pushbacks that people have raised.

-Chris

Here’s another idea on how to make incremental progress here:

It has been observed that some use-cases would benefit from a builder that makes the location implicit at the create site: the location is a mutable part of the state in the builder itself. This is controversial because some of us want to make sure that location processing is thought through and would like to put some burden on compiler engineers to do the right thing with locations. OTOH, it is clearly useful for specific design patterns, including entire passes that work in a structured way for which this makes sense, as well as the primary EDSC use case of “splat out a bunch of IR” which presumably comes from one location.

I think we can have the best of both worlds here with the existing opbuilder stuff. I think we keep OpBuilder the way it currently is - it supports creation of arbitrary IR nodes, requires the location to be explicit in the create call, and is the thing that is generally passed around between the core parts of the compiler.

To solve for the use-case above, we introduce the new ImplicitLocOpBuilder described above. This allows clients with the structured creation logic to opt-in to implicit location handling.

I think this covers both bases well. WDYT?

-Chris

Hi all,

I implemented the ImplicitLocOpBuilder concept in CIRCT and we’ve been using it for some time now. It works well, so I think it would be useful to upstream it to MLIR. Here is a patch that does so, I’d appreciate feedback and review, thanks!

-Chris

1 Like

+1 data point: while writing approximations for math functions we had a style/design questions: use generic builder API or custom-built DSL/builders.

Writing math function approximations is basically expanding high level ops like Log into sequence of primitive add/mul and bitcasting, and it is pretty dense, and hard to follow, and in my opinion default builders API adds too much boilerplate and makes it even harder.

Discussion is in: https://reviews.llvm.org/D97146

With custom builder std::frexp can be implemented with:

  Value cst126f = f32Cst(126.0f);
  Value cstHalf = f32Cst(0.5f);
  Value cstInvMantMask = f32FromBits(~0x7f800000u);

  // Cast to i32 for bitwise operations.
  Value i32Half = bitcast(i32, cstHalf);
  Value i32InvMantMask = bitcast(i32, cstInvMantMask);
  Value i32Arg = bitcast(i32, arg);

  // Compute normalized fraction.
  Value tmp0 = bitwiseOr(bitwiseAnd(i32Arg, i32InvMantMask), i32Half);
  Value normalized_fraction = bitcast(f32, tmp0);

  // Compute exponent.
  Value tmp1 = castSiToFp(f32, logicalShiftRight(bitcast(i32, abs(arg)), 23));
  Value exponent = sub(tmp1, cst126f);

  return {normalized_fraction, exponent};

after multiple iterations my best ImplicitLocOpBuilder version is:

  int width = vectorWidth(arg.getType());

  auto bcast = [&](Value value) -> Value {
    return broadcast(builder, value, width);
  };

  auto i32 = builder.getIntegerType(32);
  auto i32Vec = broadcast(i32, width);
  auto f32Vec = broadcast(builder.getF32Type(), width);

  Value cst126f = f32Cst(builder, 126.0f);
  Value cstHalf = f32Cst(builder, 0.5f);
  Value cstInvMantMask = f32FromBits(builder, ~0x7f800000u);

  // Cast to i32 for bitwise operations.
  Value i32Half = builder.create<LLVM::BitcastOp>(i32, cstHalf);
  Value i32InvMantMask = builder.create<LLVM::BitcastOp>(i32, cstInvMantMask);
  Value i32Arg = builder.create<LLVM::BitcastOp>(i32Vec, arg);

  // Compute normalized fraction.
  Value tmp0 = builder.create<LLVM::AndOp>(i32Arg, bcast(i32InvMantMask));
  Value tmp1 = builder.create<LLVM::OrOp>(tmp0, bcast(i32Half));
  Value normalizedFraction = builder.create<LLVM::BitcastOp>(f32Vec, tmp1);

  // Compute exponent.
  Value biasedExponentBits = builder.create<UnsignedShiftRightOp>(
      builder.create<LLVM::BitcastOp>(i32Vec, builder.create<AbsFOp>(arg)),
      bcast(i32Cst(builder, 23)));
  Value biasedExponent = builder.create<SIToFPOp>(f32Vec, biasedExponentBits);
  Value exponent = builder.create<SubFOp>(biasedExponent, bcast(cst126f));

  return {normalizedFraction, exponent};

Although it is not that bad as my original attempt to use builders (without implicit loc), it is a bit hard to get the gist of what’s happening here. And there are many similar functions that have even more bitcasting and bitshifting.

Yeah, these kind of hand coded “subroutine expansions” are where the dsl style makes a big difference.

What I don’t know how many of these there are and whether it is a persistent, minority case or whether we just sweat through it.

Well 2 parts here, 1) you can use b instead of builder as that is not needed to be spelled out (and internal to function that is convention we often use as the scope of use removes visual “noise” without impacting readability) 2) you need to ensure non-determinism still which means you have to avoid argument evaluation order issues.

2 in particular will mean that writing these will look more verbose and uglier in C++. And that may have (for me, has) much larger impact than extra characters - tmp0 and tmp1 being both things, being careful when you can and can’t call multiple creates/helpers in one line. The autocompletion angle is one where Alex & Rahul (I think) made good points (but that could also perhaps be done uniformly rather). But I worry if the uncanny value effect would not make DSL style more fragile (it is so close to math that one forgets it is building IR and then you use wrong constructs, C++ for vs SCF for); I could be wrong and it could be equally easy either way.

I think the line between helper function and dialect-specific builder is a bit thin in some of our discussions here (and helper functions are a thing). And being able to have consistent style is desirable - which I don’t know if that pushes for helpers to have same form as builders :slightly_smiling_face:

I agree with Jacques two points. I’d add for #1 that you can also using namespace LLVM to reduce another set of boilerplate. You can put using directives inside of functions and other scopes, so they don’t have to pollute your whole file.

The C++ order of evaluation issue is pernicious and subtle, we recently got bit by it in the CIRCT side of things. It is convenient that GCC and LLVM are different here, so we can at least have early warning about it, but anything that leads to a lot of nested expressions will have to be aware of this. To give you an explicit example with Eugene’s helper syntax, even this is not safe:

Value tmp = bitwiseOr(bitcast(i32, x), abs(arg));

because the abs and bitcast can be emitted in any order.

-Chris

Yeah I was very supportive the DSL approach (which became edsc) until I realized how hostile C++ is a host language for DSL: the order of evaluation issue can lead to tricky behavior! And since this all platform specific it only hits us in CI (or worse: later when someone tries the code on a new platform).

(Number of days since we rediscover that C++ is a challenging language to build a DSL in: 0)

1 Like

Yes, but it looks like you are missing a conclusion here! How do you avoid the presence of these counter-patterns (order of evaluation issues from such builders) in code to the extent possible? @ftynse already mentions about order of evaluation in his OP: but the key is that I haven’t seen a single instance of someone writing b.create(....., b.create(...), ..., b.create(...)). You don’t get line compaction nor readability. To the contrary, such forward substitution would be the norm with the DSL style builders being proposed - for eg.:

output(indices.outputs) = x + y + max(input(indices.inputs), output(indices.outputs));

Are you sure what IR it generates?! This was what I was mentioning in post #30.

IMO, there are two styles of DSL we can build: numpy/math-like with nested calls and overloaded operators or WYSIWYG that resembles the generated IR. In the latter case, you actually want one C++ statement per IR instruction, but it doesn’t look easy to enforce that in C++ without additional boilerplate.

I suppose we can somehow teach clangd to “see through” std::forward-like functions with variadic templates and suggest completions based on the signature of the final function. Same could be interesting for std::enable_if and other SFINAE tricks. This sounds like a good summer intern project, but I am not familiar enough with clangd to support it. We may also run into the need of making clangd aware of library semantics.

There’s also the issue of ordering between template argument deduction vs. implicit conversion.

That was one of my arguments: people will add helpers. So let’s have consistent helpers instead of having to discover a new style in every file :slight_smile: This can be also solved with a style guide or a “best practices” document.

One could argue that this is mostly a problem of using ordered textual checks in tests (and I have heard a similar complaint from Clang and Swift folks some time ago) when the actual order does not matter. The downstream passes should work equally well regardless of the order.

I have and complained about this in review :slight_smile: It was for subview-style ops that often use multiple constants as arguments if I am not mistaken, and before these ops could use attributes instead. There was also something like builder.create<SubViewOp>(loc, getZeros(builder, rank), sizes, getOnes(builder, rank))) where the nested calls are hidden in the helpers. I’m also quite certain we have b.create(...., b.create(...), ...), that is the “safe” single-nested case, in the code base for something like dim.

It feels like there is a surprisingly common subjective complexity metric that prevents folks from writing such code in most cases.

No, I don’t think this is an issue of tests and CHECK-NEXT vs CHECK-DAG. The issue is that it is really important for compilers to be determinstic and reproducible. If IRGen for some compiler produces different output depending on whether it was built with GCC or Clang, then that will eventually (after many passes) manifest as different miscompilations exposed, different performance of the generated code, etc.

-Chris

2 Likes