[RFC] DSL-style / dialect-specific builders

DSL-style builders

Context: Evolving Builder APIs based on lessons learned from EDSC

Introduction

This document proposes an ODS-backed approach to defining IR-building APIs, complementary to OpBuilder. It intends to replace EDSC’s intrinsics mechanism with auto-generated APIs that rely on the main Builder infrastructure in a transparent way.

Proposal

Dialect-specific builder classes

The main concept of this proposal is a dialect-specific builder class. This class caches OpBuilder and Location and defines methods with names derived from Op names that forward their arguments to the builder. Here’s an example:

class StdBuilder {
public:
  StdBuilder(OpBuilder &b, Location loc) : b(b), loc(loc) {}

  AddFOp addFOp(Type result, Value lhs, Value rhs) {
    /* this may need std::forward to make sure rvalue references are handled correctly*/
    b.create<AddFOp>(loc, result, lhs, rhs);
  }

  ReturnOp returnOp(ValueRange args, ArrayRef<NamedAttribute> attrs = {}) {
    b.create<ReturnOp>(loc, args, attrs);
  }
protected:
  OpBuilder &b;
  Location loc;
};

Such classes can be created as local named variables and used to construct IR in a “DSL-like” manner, making the code that constructs IR visually resemble the IR it constructs.

  StdBuilder std(builder, loc);
  Value v1 = std.addFOp(arg1.getType(), arg1, arg2);
  Value v2 = std.addFOp(v1.getType(), v1, v2);
  std.returnOp({v1, v2});

The methods in these classes are trivially inlinable so they are not expected to yield additional overhead at runtime compared to builders.

This brings a major usability improvement over the current building API – IDEs can understand such functions – meaning you get autocompletion, type prompts, inline type mismatch checking and so on. Today, it is impossible to get through the template <typename... Args> create(Args &&...), that seems to blindly accept any arguments.

Examples of the prototype with YouCompleteMe + clangd in vim:
Screenshot from 2020-09-18 15-16-32

Automatic generation

Such classes can be generated automatically from ODS, using the builders field of the Op class. This can happen at compilation time, together with actual Op definitions.

For historical reasons, the builder function signature is a plain string in the ODS syntax, so it requires some additional manipulation to chop off the builder and state arguments and introduce names for the formal arguments that are missing them. This has been prototyped using a relatively naive approach, which has worked just fine on the upstream dialects.

This can be a good time to transition the builders field to use the DAG-based format that we already use in OpInterfaces and other more modern parts of ODS, i.e. (ins "Type ":$name). This is a one-time transition that will make it easy to programmatically process argument lists for builders. This opens the door for other useful features such as documenting individual builder arguments.

If we discover out-of-tree dialects that cannot be processed by the naive parsing approach, we can piggy-back on clang and get a full-fledged C++ parser to perform the transition.

For dialects with Ops that opted out of ODS, it is simple to derive from the autogenerated class and add more methods that follow the general pattern.

Multi-dialect builder classes

In many cases, operations from multiple dialects are constructed together. Having one builder object per dialect will make the “preparation” code more verbose, and create a situation where different dialects may be inserted into different locations using different builders.

In multi-dialect context, a utility “container” class can be introduced as follows.

/* Dialect-specific autogenerated wrapper. */
struct StdOps {
  StdOps(OpBuilder &b, Location loc) : std(b, loc) {}
  StdBuilder std;
};

/* Container class. */
template <typename… Dialects>
struct MultiDialectBuilder : public Dialects… {
  MultiDialectBuilder(OpBuilder &b, Location loc) : Dialects(b, loc)... {}
};

This scheme can be used to combine multiple dialect-specific builders under one object while preserving the per-dialect difference to avoid name clashes between ops and keep the dialect name visible in the API.

  MuitiDialectBuilder<StdOps, ScfOps> create(...);
  create.scf.forOp(...);
  create.std.addFOp(...);

Given the concerns about the EDSC naming scheme (see “Misleading Names” in Evolving Builder APIs based on lessons learned from EDSC), this approach may be encouraged even for single-dialect cases.

Interaction with the Builder infrastructure

Since these classes are mostly thin wrappers around OpBuilder, they are expected to interoperate with the Builder infrastructure seamlessly. Only PatternRewriter::replaceOpWithNewOp does not have a direct counterpart, but can be replaced by a dialect-builder call followed by PatternRewriter::replaceOp.

A potentially dangerous behavior can appear in the case of nested-region builders, for example, in

create.scf.forOp(..., [&](OpBuilder &nested, Location loc, …) {
  // `create` must be updated to point to the new builder before being used here
  DialectBuilder::Scoped raii(create, nested, loc);
  create.scf.yieldOp(...);
});

one may forget to update the insertion point of the multi-dialect builder. Practically, in-tree nested-region builders forward the OpBuilder on which they were called as nested after updating the builder’s insertion point. However this behavior is not guaranteed. Note that this issue is not specific to this proposal and can also be triggered with regular OpBuidlers by using the wrong (outer) builder instead of the nested one.

There are no current plans of replacing the OpBuilder infrastructure with this approach, OpBuilder remains necessary as the insertion point manager and notifier for mutation listeners.

Interaction with EDSC

This mechanism is expected to replace EDSC intrinsics by providing similar functionality that is more clearly based on the main builder infrastructure. It also replaces most boilerplate “intrinsic” lists such as https://github.com/llvm/llvm-project/blob/master/mlir/include/mlir/Dialect/Vector/EDSC/Intrinsics.h#L17 by tooling.

EDSC still remains based on the ScopedContext container that stores the stack of builders + insertion locations in thread_local storage. In order to make use of the proposed API, one has to either extract the builder+location from ScopedContext when constructing the dialect builder or to pass around the pre-constructed dialect builder by-reference. In any case, the interaction with the “implicit context” becomes visible and can be progressively removed by always passing the dialect builder. While enabled by this proposal, the removal of ScopedContext is beyond its scope.

Improvements over EDSC

This proposal addresses several EDSC drawbacks identified in the previous discussion.

Cost of Extra Abstraction

This removes edsc::OperationBuilder and edsc::ValueBuilder. While it does also introduce a new concept of a (multi-)dialect builder, it is arguably simpler as it clearly just forwards the construction to an OpBuilder, without involving ScopedContext. Furthermore, it removes the notion of “intrinsic”, which is badly overloaded in a compiler context.

As a nice bonus, it removes the instances where the EDSC intrinsics API required extra parentheses due to the most vexing parse in C++.

Misleading Names

Construction API looks less like a function call from which it is unclear whether some IR is being constructed. In the minimal case, the construction is a method call on an object with a clearly identifiable “xxBuilder” type that implies IR is being built. With multi-dialect builders, one can also name the builder instance in such a way that the act of construction is obvious from each call, e.g. create or build. At the same time, the names of individual functions are derived (in ODS) from C++ class names of the operations. Furthermore, when delegating parts of the IR construction to other functions, this approach encourages (but does not require due to ScopedContext) passing the builder as argument, which allows one to infer if the function constructs IR from its signature.

Scalability and Boilerplate

This replaces boilerplate code currently required by tooling that generates code from ODS.

Known objections

But locations are important!

Adopting this approach will essentially cache the Location instance, making it less visible in the code and removing the incentive for users to keep them up-to-date.

This approach is no worse than literally caching the location in a variable:

Location loc = builder.getUnkownLocation();
/* two screens of code here */
builder.create<Op1>(loc, …);
builder.create<Op2>(loc, …);

which happens a lot in the codebase. A common place for this pattern are rewrite patterns where one just forwards the location information in a rather verbose way. We can easily generate factory function overloads that also take Location as the leading argument, and provide means to set builder locations in a scoped way via RAII.

This is merely saving a couple of keystrokes!

One can just always call the OpBuilder instance b and achieve the same result.

Assuming a single-letter variable for location (e.g., l) + multi-dialect builder + absence of using namespace other than mlir, this saves 12 characters per builder call:

b.scf.ForOp(...);
b.create<scf::ForOp>(l, ...);

This might feel like nothing, but it’s 15% of the total line length. In practice, this number may get larger depending on location naming conventions. When combined with line reflow, this can make code substantially more compact.

Usability gains are also not limited to less keystrokes to type. One no longer needs to visually filter through the “API crust” of builder.create< >(loc in long construction chains.

EDSC produces ops in unspecified order, this is the same!

This API, EDSC and OpBuilder all can be used in such a way that they produce operations in an unspecified order, until C++17. Builder API is just sufficiently verbose so that one is almost never tempted to use it that way:

OpBuilder b(...);
b.create<AddFOp>(type, b.create<MulFOp>(type, v1, v2), b.create<MulFOp>(type, v3, v4));

will produce the two mulf operations in an unspecified order.

Builders just work for me!

Builders are not going away. However, people will (and already started to) build abstractions around them, including in the core repository. By providing usable abstractions out-of-the-box, we make sure there is a carefully designed, maintained abstraction layer that is common across the infrastructure.

1 Like

I was considering something similar on the python side but also tying it together with dialect registration and just making it a default part of the context in some way.

I was also thinking about this for python. This is definitely closer to what we need there. I suppose we can parse the c++ types and translate them as possible?

While we are on the topic of builders, should we also consider bringing some of the benefits of this API (like autocompletion) to OpBuilder style API as well. I was thinking something like:

OpBuilder builder(...);
SCF::IfOp::create(builder, loc, ....);

instead if today’s:

OpBuilder builder(...);
builder.create<SCF::IfOp>(loc, ...);

It does not save on “API crust” mentioned above, but will help autocomplete and enable using of {} for ArrayRef/Range parameters without an explicit cast. It seems this could be orthogonal to the DSL style dialect builders above. For OpBuilder, the idea would be to then migrate to this new style and deprecate the builder.create<> function (make it private).

This is overall a great proposal!

That’s a neat idea! It sounds more appealing to me that adding a dependency on libclang to parse some C++ at MLIR build time (if I understood the alternative correctly).

Hey Alex:

Random request for this: please add a “create” imperative verb into this, so it is v1 = std.createAddFOp(.... In addition to fixing the action, it also helps the capitalization issue.

… ah, I see you cover that below in “misleading names”. I don’t agree with you that knowing about the object being invoked here is enough. Part of our naming convention should be encouraging consistency across the codebase, and we use create for this sort of API. Dropping that is a significant regression - and I don’t see that saving 5 characters matters here.


It looks like the design you’re proposing is shifting the location into the builder itself - this is “dangerous”. Part of the reason MLIR requires the location to be specified for each build operation is that we want people to think about it. This reduces the likelihood that people will set the right locations.

… also, see that you cover this below in “But locations are important!”. I guess I just really disagree with your analysis here. Of course people can do the wrong thing. If you think we’re doing locations wrong, then we should fix the main OpBuilder API. We shouldn’t introduce a parallel way to do things that uses a different policy.


Once you move this, your change is merely removing two angle brackets vs what we have now:

Value v1 = std.createAddFOp(loc, arg1.getType(), arg1, arg2);
Value v1 = std.create<AddFOp>(loc, arg1.getType(), arg1, arg2);

Can you elaborate on what the win of this approach is? Particularly the MultiDialectBuilder etc seems like a lot of complexity and template burden added into the system. It also breaks symmetry with other things that work with operations by template, like the casting and other other stuff. Would those be specialized by operation as well?

… I see you’re covering this below as well. I just disagree with you, because you’re saving “keystrokes” by breaking consistency and duplicating a core API. I really don’t think this is worth it.

Great idea, I’d love to see this regardless of the discussion above. There is a patch in flight that handles this with a nice transition of supporting the old thing and the new thing for a period of time. Doing something similar should make it straight-forward to roll the new thing out.

I think this really underlines why I’m not a fan of this proposal - we should have fewer and more consistent ways of doing things in MLIR, not different approaches for different camps. If there are camps that want something different, they can build that stuff outside of MLIR core.

-Chris

In Python, one can just inject new members to object’s __dict__, so registering a dialect could define a corresponding member in a context that can later be used to construct stuff.

I thought about this too. In simple cases, yes. But we can also get something like const ::llvm::DenseMap<Operation *, UserDefinedType> &. One option is to have a type mapping somewhere in the Python API generator and complain on unknown types, there shouldn’t be a lot of those. Another option is to consider using tablegen defs instead of strings, e.g. StringAttr:$name, for types and tie that with the existing mechanism in default builders. I would really like to avoid a build dependency on clang.

This is quite minimalist to cover the current EDSC uses, but could work if a bigger change is not desired. Potentially, with some further combination of OpBuilder and Location since both have to be passed in every time.

I meant using libclang to achieve the one-time transition if the stupid parser I wrote does not scale enough. Build-time dependency would be really bad.

That’s why I proposed further down to use the multi-dialect flavor even in simple cases. By naming the builder variable create, you get create.std.addFOp() that clearly indicates the action. I admit that since variable names are chosen by the user, this is inferior to having “create” in the function name, but this can be a part of the naming convention similarly to using camelCase.

Encouraging consistency should not be limited to naming conventions. Encouraging people to build potentially incompatible helpers out-of-tree goes into the opposite direction of consistency IMO. And people will build helpers, in-tree or otherwise.

I’m not sure we actually do. There’s no createRegion, OpBuilder::createBlock is used in 25 places in-tree and so is new Block. Before I fixed createBlock to work with rewriters a couple of months ago, half of the current createBlock users were just allocating blocks with new. The only two create functions outside of builder infrastructure and its support (Operation::create) are FuncOp::create and ModuleOp::create, which are there for legacy reasons: they existed before functions and modules became operations and were not cleaned up. The rest of the IR objects are context-owned, so we get them rather than create.

There may be helper functions in different places that have create as part of their name (although I have seen emit more often recently), they aren’t really a part of the API.

It’s challenging, although not impossible, to collect evidence to support an argument here, but I would claim that requiring people to do something overwhelmingly mechanical repetitively achieves the opposite result of making them think about it – they mentally automate it. As a sort of extreme analogy, one doesn’t think about the relative location of the brake and accelerator pedals in a car beyond the first hour of driving. As a concrete example, quickly skimming through uses of = [A-Za-z].getLoc() and its pointer-based equivalent, I see 100+ functions (predominantly patterns and transforms in passes) that do exactly what I described: store the location in a variable and use it to construct every operation. Anecdotally, my way of thinking when writing such code is somewhere between “argh, I’ll eventually need a location, so let’s just get it upfront” and “Location loc = op.getLoc(); is part of the necessary boilerplate to make patterns work, almost a part of matchAndRewrite signature”.

The issue I see with the current design is that it doesn’t really prevent people from doing the wrong thing, but it makes life harder for everyone to do anything, be it right or wrong. It feels like overcompensating for past design decisions. What I am proposing is to shift the design a bit towards making everything easier.

I think there are two distinct cases for locations building-from-scratch and building-during-transformation: the former requires different locations for each op; the latter requires preserving existing locations, often in one-to-many op expansions. The approach we have seems to be optimized for the former at the expense of the latter. The proposal is optimized for the latter and does not make the former more expensive. Furthermore, I expect the latter to be significantly more frequent than the former, which only happens in “frontends”. Note that we don’t seem to have a problem with PatternRewriter::replaceOpWithNewOp, which simply forwards the location of the existing operation to the new one even if something else, e.g., creating a FusedLocation, may be desirable. However, separating the two cases and using different APIs for them would problematic, as you will probably agree.

that’s why I do not propose the syntax you mentioned… Nevertheless, it will still bring two improvements: autocomplete/documentation and working implicit conversions as @jurahul explained above. It also looks possible to rework the implementation so as not to depend on create<> internally and save the compiler the cost of template instantiations, but out of scope for now.

Depends on the reference point. Yes, op creation will not look “similar” to op casting because of not using templates. But today, op creation looks different from attribute, type, block and region creation, neither of which uses templates (attributes and types use templates for casting though); with the proposed change, none of the creation operations will use templates, which sounds more consistent under this light.

Sure, I thought about that and it should be easy to dispatch in mlir-tblgen based on the initializer type being a string or a dag.

One can build operations by populating OperationState and calling Operation::create (and there are actually several uses of this scheme in-tree, outside the builders infrastructure), but we don’t consider OpBuilder a duplicate API for that reason. I expect something similar in this proposal: we layer an abstraction on top of OpBuilder, which conceptually is just an insertion point + creation hooks anyway. Ideally, all code for which it makes sense switches to the new abstraction, but we cannot just deprecate OpBuilder::create and remove the 1100+ in-tree calls (as well as several thousand out-of-tree calls) in one shot. So keeping it, at least in the short term, is just pragmatic.

Different people having invested in building similar helpers in-tree sounds like the evidence of the opposite.

This sounds self-contradictory to me. We want fewer, more consistent ways and, at the same time, we want people to split into camps and build inconsistent out-of-tree support stuff. This is bad for the ecosystem. This sends a bad signal to people who want to improve something in-tree. And, in the meantime, we already have different approaches in-tree that this attempts to unify and that are unlikely to just go away otherwise. On a meta point, it’s concerning how hard it is to push quality-of-life improvements to core. Sure, people will use MLIR because there is no alternative, but I’d like them to use it because it’s the best thing to use.

Can you elaborate on this? I don’t see Chris for instance pushing against “quality-of-life improvements” but pushing against what he perceives as something that “superficially looks like a qualify-of-life improvements but that encourages bad idioms and will lead to less readable code”.

I sympathize with Chris’ view on this, it also matches my natural instinct, but I’m just on the more pragmatic side in this case: the current proposal looks to me like a better middle-ground to replace the EDSCs APIs (and all the out-of-tree helpers people are using/building).

There is a point that keeps me a bit worried: the location management. I agree with Alex that it is a bit too mechanical right now, this proposal seems to just accept this as fact and could contribute to make the situation “less fixable” in the future. Many patterns are many-to-one or many-to-many yet we only propagate the root location: it’d be nice if we could improve the situation on this.
For now we could include the location in the API: AddFOp addFOp(Location loc, Type result, Value lhs, Value rhs) { ; or at least provide this form as an overload that allows to specify a location.

It’s a general feeling in a rather broad sense, nothing about Chris. FWIW, his “named SSA values” proposal qualifies as quality-of-life, and it took substantial time and effort to get accepted, too. Any sufficiently large change with compiler-writer (i.e. ourselves) experience improvement as the primary goal receives some pushback. Achieving the same by incorporating the original change into several changes that introduce other features as their main goal does not. Without attributing any intent to “sneak in” changes, StructBuilder/MemRefDescriptor and Linalg named ops can be seen as examples of this. We get into a situation, where we only get improvements for ourselves as a byproduct of other changes.

Also discussions tend to turn meta too quick. I’m the one who started it here, so let’s try to keep it focused on the proposal for other folks to voice in.

I can totally understand this for introducing new things without real need. In this specific case, I’m more in the “if you can’t fight it, join and lead it” mindset.

agreed, this was included from the start:

I agree with you and, I think, Chris here, location handling is likely the key point here and it may be worth factoring out into a separate discussion.

I’m still interested in opinions on this proposal specifically.

I think I can honestly say that I still haven’t encountered a situation where I needed more than one location within the same IR-building function. I think that catering to that use case will pay dividends. It seems trivial to also have the explicit form when needed (frontends?).

I definitely want to see the IDE autocomplete and {} benefits that @jurahul mentioned. That’s more important to me than trimming the size.

I wonder how far we’d get with just two conceptually small changes:

  1. TableGen the scf::IfOp::create-style methods as in @jurahul’s proposal (is there any objection to this?)
  2. Introduce OpBuilder::LocationGuard guard(builder); to cache a location on the OpBuilder. The explicit-location variants of FooOp::create override the implicit location, but we could then introduce implicit-location variants that require being used with the builder guarded by a LocationGuard.

Also, it’s worth noting: if trimming the number of characters was a high priority for MLIR code authors, we would see more builders referenced with single-letter variable names like b, but I still seem to mostly see builder and rewriter. That’s a bigger savings than trimming off “create” from the name.

I was discussing this just this morning with an internal MLIR user: I think the fact that we don’t display location by default in the IR output lead to not pay much attention to testing the most desirable location propagation in transforms.
In many cases I suspect manually written transformation are just propagating the location for the “root” operation in cases where a FusedLoc would be more accurate.

+1 on the importance of having IDE auto-complete: I frequently end up either opening the TableGen generated file to check on the build method available, or to write builder.create<MyOp>(loc); and try to build to see in the clang failure output the list of possible overloads.

I don’t think we’d get to replace EDSC intrinsics, which is one of my goals here. Best case scenario, we replace all uses of builder.create<Op> with Op::create(builder in the codebase by spending ~a month and be back to the point where we only have two ways of constructing ops. Less positive scenario, we end up with three ways of constructing ops.

I don’t have conceptual objections to this style.

Aside from this proposal being more of a style change than character trimming, I found ~60 OpBuilder b (other than builder) in the code base for ~100 OpBuilder builder. These are fresh variables where we made a hopefully conscious choice of names. There are 500+ OpBuilder &builder, but many of those are cargo-culted inside ODS’s OpBuilder<"OpBuilder &builder, OperationState &state and then enforced by linters that complains about name mismatches in declaration/definition or overrides (for rewriters). And code reviewers would likely tick on a variable called b if it was live more than a couple of lines.

This is a good point.

I share this feeling. There are also weirder cases where a pattern produces new operations without removing the previous one and other many-to-many things. At some point, I circulated an idea of “fat” locations that would keep track of all patterns that were applied to get to an op from its original state. This is prohibitively expensive in non-debug mode though.

I have recently found myself doing Tablegen’s job mentally to deduct the type of the autogenerated builder with 10 arguments.

Yeah, I don’t sense any malice from Alex on this, this seems like a normal technical discussion!

My position comes down to a few different things:

  • I think we should aspire for consistency in the APIs, and strive for fewer patterns that can be applied more broadly. This helps the project as it continues to scale, avoiding different subcomponents from dropping into different fiefdoms over time. I personally don’t think it is a problem if derived projects (e.g. tensorflow) end up wanting to have local styles because they have different cultures, I’m just talking about standardizing within MLIR/LLVM.

  • I have several specific technical concerns about the proposal, e.g. the location processing mentioned elsewhere. This proposal introduces more ODS code generation than the existing create<> approach, to save two characters. Metaprogramming has a cost that needs to be balanced, etc.

  • The big issue is that the proposal is effectively moving several different design issues all at once. I’d rather see these moved case by case with their own discussions, through normal iterative evolution of the existing builder APIs.

I was never a fan of the EDSC work in the beginning, but it came from a time when MLIR was much more exploratory and less understood. I really approve of replacing it with standard builders.

-Chris

On this point, I think I am more concerned that the history of these discussions creates a chilling effect on people feeling design freedom to explore any higher levels of abstraction for solving such problems at layers that really could benefit from it. Personally, I don’t think that at the lowest levels, EDSC’s in their current form are “paying their way” (ie. As part of the mlir core APIs with so much overlap with the c++ builders) and the work to uplift the builders at that level has been a nice outcome. Addressing more of these design points incrementally will only make that better.

However, having stared down the barrel of transformation patterns for thousands of mid level ops, personally written many of them, reviewed even more from multiple sources, and cringed at the outcome in terms of legibility (and the associated sins that get committed), I feel that there is still ample room for improvement – ie. What EDSC’s were trying to do versus what they actually did manage is relevant.

I wouldn’t bucket this as “a tensorflow problem”, (although I’d say tongue in cheek that they have adequately explored the design space on how to make an unchecked explosion of patterns and ops). Numpy and PyTorch decompositions, while having better internal structure to their op libraries that can be leveraged, are still looking at O-of-thousands of things better expressed in some higher level language of algebra vs compiler transforms. I don’t think EDSC’s go nearly far enough in this, and I want to make sure the debate about them doesn’t salt the ground on the overall topic to finding meaningful simplifications at the higher levels of abstractions. We do benefit from those design discussions happening within LLVM as well, and not just for outside projects.

I just don’t feel like differentiation at the level of sugaring a builder API (which EDSC’S seem to want to target “a half a step above”) is going to get us very far, but there is still a lot of need in the topic, imo.

Ditto here. Jumping to the op def gets to the tablegen generated .h.inc file (for eg. YCM GoTo in vim takes one there directly), where the list of build methods would be there. This is so common that IDE auto-complete would be great to have.

This would be pretty painful - I think the rock bottom!

A point I’d like to make on compiler build times when using EDSC’s: A file in the current codebase that makes really heavy use of EDSCs is lib/Dialect/Linalg/Transforms/Loops.cpp - probably the heaviest across the codebase. lib/Dialect/Linalg/Transforms/Loops.cpp is just 745 lines of code but this single C++ source takes nearly 15s to compile on a modern fast workstation in RELEASE mode. I’m not expert enough to say this is directly due to EDSCs or if there are ways to bring down its build time, but it’s really a roadblock if this is going to be the result. I’ll be a big -1 on EDSCs if such is the increase in build times for the syntactic sugar it’s adding - I’ve seen other files that are much longer and heavier in terms of IR creation (mhlo to lmhlo, lmhlo to affine) with standard builders that take less than 1/4th of the above-mentioned time to build with everything else the same.

I’m sorry if my comments above came across this way, that wasn’t intended. I was trying to draw a distinction between LLVM/MLIR itself and downstream projects. These downstream projects naturally don’t always follows the LLVM conventions, e.g. using the Google C++ standards or other local standards - TensorFlow is just one example, there are fortunately many many downstream users!

I very much endorse having expressive and high quality builder APIs, I just think we should aim for a single opinionated API… and if it isn’t good enough, then we should make it better, not create/perpetuate multiple options within MLIR.

Clang has the -ftime-trace option which is really useful in such cases.

On my machine it takes 21s to compile this file, from what I see on a high level there were 6s only spent in the frontend and 13 in the optimizer/backend.

In the frontend there was 1.5s in the include of “mlir/Dialect/Affine/IR/AffineOps.h” (and transitive), and ~1s to instantiate each of lowerLinalgToLoopsImpl<mlir::AffineForOp>, lowerLinalgToLoopsImpl<mlir::scf::ForOp>, lowerLinalgToLoopsImpl<mlir::scf::ParallelOp>
I think they each are instantiating a bunch of patterns in the insertPatterns calls.

Anyway most of the time is in LLVM and not in clang, and the profile is fairly flat (10% in the inliner is the highest in the optimizer apparently), there does not seem to be a particular function that is taking most of the time either.

1 Like

What would happen if there are multiple derived projects that want to share a style? (Let’s say both TensorFlow and PyTorch users of MLIR want EDSC, for example). At which point something becomes “core enough” to warrant inclusion upstream?

Generally, do you expect a lot of API pattern / concept reuse between LLVM and MLIR?

I do agree that this proposal moves several design decisions at once, yet it also feels incremental compared to what we may be able to implement. Evolving API as core as builder also has costs, and almost everybody will have to pay them, so I am trying to balance the scope of the proposal and the number of different improvements we need to discuss.

Can we maybe have a list of design issues that we think are worth discussing separately? Location handling is one and doc/autocomplete is the other. Naming scheme? Anything else? (Also note that this thread is already split out of a larger thread).

The usual suspects in C++ are template instantiation and heavy overload resolution. Builder API uses both… But you are right that we should consider the compilation time as a cost metric.

Thanks!

Yep, I think those are two different use cases:

  1. traditional compiler transformations, like small pattern applications, GVN, LICM, etc. that require very targeted and small-scope IR building/transformation.
  2. code that is basically macro-expanding a small subroutine. It might need to twiddle some attributes or some other non-trivial metaprogramming that cannot be handled by literally calling a subroutine, so it generally needs to remain a compiler transformation. These are currently pretty poorly served by MLIR (not just builders, but also testing infra), and I think is the space that EDSC was trying to colonize.

In case 1., one is never really “reading the code” by staring at the compiler pass, since it is usually a small number of ops being created in a single place in the compiler pass. So bulky create methods aren’t a big deal, and encourage a bit more attention to detail. But in case 2. it’s important to optimize for readability of the compiler pass to be “as if you were reading real code, not builder calls”.

To make this concrete, consider this recent patch of mine. It just splats out about 20 ops that all inherit the same location (and also note that FileCheck testing this is basically just a translation from “builder syntax” to “mlir syntax”; not super valuable, but I don’t think there is much better to do here). I don’t see a strong reason why for this “macro expansion” family of transformations like this we cannot have a second builder API that optimizes for “readability of the compiler code as if it was the code being emitted”.

In my opinion, it would never make sense to warrant inclusion of such things upstream.
Instead, there should be a new project defined that depends on LLVM/MLIR that both downstream clients use/share.

It is important that LLVM stay as consistent as we can make it. The alternative would lead to potentially many different camps each having their projections/deviance of LLVM style trying to upstream it into LLVM because there are multiple clients. This isn’t something that scales, and it doesn’t make sense for LLVM to take on the burden of maintaining that.

I’m not sure what you mean here?

I think that you’re both right here - you’re right that both approaches are based on templates, Uday is right that the ultimate compile time in practice matters if it is significant. I think we should just be data driven here and try some A/B experiments. Uday indicated the Loops.cpp file upthread - what happens (in practice) to compile time with the different approaches? Not all templates are the same after all, and it looks like the inliner is a significant issue here, which indicates that there could be excessive layers of template overhead that is getting eliminated with one approach.

-Chris