[RFC] Make <OpTy>::build take OpBuilder instead of Builder

As we start defining more complex Ops, we increasingly see the need for Ops-with-regions to be able to construct Ops within their regions in their ::build methods. However, these methods only have access to Builder, and not OpBuilder. Creating a local instance of OpBuilder inside ::build and using it fails to trigger the operation creation hooks in derived builders (e.g., ConversionPatternRewriter). In this case, we risk breaking the logic of the derived builder.

(For example, we can currently break the conversion pattern rewriter logic as follows: create a loop::ForOp that constructs a loop::YieldOp terminator; inline the body of this loop; erase the loop; undo the rewrite. Since the loop::YieldOp was constructed outside of the pattern rewriter, it has no knowledge of it and will not be able to remove it when the rewrite is undone, leaving a dangling terminator op in the middle of IR where a loop, i.e. a non-terminator, was inserted. Same is actually true for all region-carrying ops that call ensureTerminator.)

Two recent use cases which could benefit from are (1) optional construction of terminators in loop.for/loop.if; (2) Linalg named ops construction.

Proposal 1: make all <OpTy>::build methods take OpBuilder * instead of Builder *. Thus they have access to the relevant op builder class and can use it to construct nested operations. The builder can be forwarded from OpBuilder::create<OpTy> as OpTy::build(this), which is already the case. We do so for all ops, whether the need the op builder or not, for the sake of consistency. Many ::build functions don’t use the builder even today but still accept it.

Proposal 2: in addition to Proposal 1, make ::build functions take a reference rather than a pointer, e.g. OpBuilder &. References are preferred in the rest of the MLIR code base, so this makes ::build functions more consistent. This leverages the opportunity of us already changing the ::build signatures to make APIs more consistent. This also ensures all relevant places are updated because pointer/reference mismatch would trigger a compilation error, while using Builder * instead of OpBuilder * can stay unnoticed for some time because it’s an upcast.

Effects on external users: all signatures of ::build functions, be it in ODS or in C++ will have to be updated to take OpBuilder instead of Builder. Under proposal 2, bodies of ::build functions that actually use the builder will have to be updated as well. Both changes are pretty mechanical.
Any direct calls to ::build will have to be updated, but these should be rare (core infra only has a handful).

Upstream changes have been prototyped in https://reviews.llvm.org/D78713

Proposal 1 makes sense to me, especially given the awkwardness around building implicit terminators and other things within build methods.

Proposal 2 is something that I intended to do a long time ago, and I’m very very +1 of having fixed.

+1 for me here as well! Thanks for doing this :slight_smile:

+1 for option 2. As an aside I recently had to use a builder to create a node with a region today and felt like it was a little awkward. I was thinking that it would be useful to create a builder based on an existing builder, which would make it easy to populate the region of a newly created OP without saving and restoring insertion points.

That’s the next step I have in mind.

+1

I think option 2 is the best choice. Both for the final outcome and also because it makes it easier to have confidence that we found all cases.

Thank you for tackling this!