De-privileging ModuleOp in translation APIs

I noticed that our translation APIs are still parameterized on ModuleOp - but as has been pointed out many times, this being a privileged root op is really just an artifact of history.

In Translation.h, any objection to changing the translation callback function signatures to (and tracking down any fallout):

using TranslateSourceMgrToMLIRFunction =
    std::function<OwningOpRef<Operation *>(llvm::SourceMgr &sourceMgr, MLIRContext *)>;
using TranslateStringRefToMLIRFunction =
    std::function<OwningOpRef<Operation *>(llvm::StringRef, MLIRContext *)>;
using TranslateFromMLIRFunction =
    std::function<LogicalResult(Operation *, llvm::raw_ostream &output)>;

If an individual translation needs to work on an actual ModuleOp, it should be its job to check that and error accordingly.

1 Like

Seems fine to me.

– River

SGTM. Do we need to document/enforce traits such as IsolatedFromAbove on the op?

What is the use case intended here?
Is there a real case for these registration beyond the command line tool / testing of mlir-translate?

Patch: ⚙ D105000 [mlir] Change translation API to allow custom top-level operations.

It required a bit more work to customize the parser, but doing so allowed the simplification of SPIRV registrations. Other out of tree use cases I am aware of are similar to SPIRV: they seek to both parse and generate top-level ops that are not ModuleOp.

It is just for these command line tools (although, I have seen the backing callbacks leak into more programmatic uses out of tree). In general, the parser supports custom top-level ops and *-translate should as well.

I’m still don’t sure I understand actually, I feel I’m missing something: the mlir-opt tool for example will always parse in a ModuleOp, but the parser supports other cases for tools that aren’t mlir-opt like.
Similarly, I can imagine other tool than mlir-translate based on different construct (you want to emit a “TensorFlow Saved Model” from a tf.SavedModel top-level representation for example), but I’m not sure this fits the API of mlir-translate (just like mlir-opt will always parse a top-level module) and the associated registrations.

mlir-translate and friends are just conveniences for bundling translations from an arbitrary (constrained to single stream/source file) forms to an arbitrary form. It has a helper TranslateFromMLIRRegistration to aid in cases where the source of the translation is from MLIR ASM.

To the extent that we have a tool API like this, we should absolutely support inputs and outputs where the MLIR-based format brings its own top-level module op. I think we can separately debate whether such a tool API makes sense at all and whether it should be widened to support things like directories of undifferentiated files (aka. saved model), but those seem separate from the general evolution towards less constrained top-level operations.

Just to clarify this assertion, the *-translate tools exist to provide faithful translations between various MLIR-asm-based and non-MLIR-asm-based formats. Their purpose in tree is first and foremost for testing, but in practice we find that folks use them as little factories for making similar little busybox-style starter translation tools, either for compiler developers or other cases. We have many formats (in-tree and out-of-tree) where the MLIR-based format is defined in terms of its own custom top-level operation. It is an error to translate this via a hard requirement to root everything on a built-in ModuleOp. We have a lot of code in various projects, similar to what I excised from the SPIRV translation which works around this constraint in the API, and the result is that we aren’t really testing flows the same way that an eventual tool operating on the format and interacting with the parser would.

*-opt tools in contrast exist primarily to load MLIR ASM, apply a transformation pipeline to it and print the results. Since transformation pipelines already have a notion of binding to children, it is not incorrect to implicitly nest things under such a top-level builtin ModuleOp: it is still possible to construct a pipeline that tests exactly what it should be. Separately, I may also argue for an ability for opt to be able to parse based on a specified custom top-level operation, since that would make some tests easier to write, but I do not think it is required for correctness.

We can consider that an improvement to be made, whereas translation should be faithful and not force operations into the path that are not actually part of the format being represented.

1 Like

Would love to have a trait/interface that allowed for top-level module-like ops to be generically identified/manipulated - in the past few days I’ve hit 3 different cases in our codebase where I needed workarounds for the implicit top-level ModuleOp. As mentioned this also leaks into other dialects like the LLVM dialect where there is an LLVMFuncOp but no corresponding LLVMModuleOp resulting in free-floating attributes for things like data layout and target triple without a good place for verifiers. Trying to work around that in one’s own dialects just results in even more special-casing being required :frowning:

Can you elaborate on what kind of properties should be exposed by a trait/interface allowing an opto be top-level?
Right now we already have in-tree the spirv module which acts as top-level in some cases.

A trait that just said “I am top-level” may be all I’m reaching for - the traits on ModuleOp aren’t enough to uniquely identify that it is top-level (vs. a function or any other op container). I’m running into a lot of cases where I have a ModuleOp and have to either determine if it is the top-level op (as it is in the LLVM dialect) or a container for a top-level op (as in SPIRV/IREE) that needs a getBody().front() to access the real top-level op. It gets more complicated when wanting to mutate a given ModuleOp such as if you need to materialize a new top-level op during splitting/joining of multiple modules as you have to first determine if that is just the ModuleOp itself or if it needs the inner dialect top-level op.

I think the suggestion above to have a CompilationUnitOp or something like it to replace the implicit outer module would help some of this as instead of carrying around a ModuleOp that may or may not be the outer op you could have a CompilationUnitOp that does not have the ambiguity as to whether it’s a ModuleOp or not. If your pass/code explicitly dealt with ModuleOp or any other dialect-specific one it would then nest specifically for that.

I’m actually a bit confused by what it means to “be top-level” other than getParentOp() returning nullptr (and the verifier passing)? Right now I haven’t seen the need for a trait because getParentOp() == nullptr seemed enough of a check.

Seems like some integration-specific consideration: why are you using a ModuleOp as a container for another top-level op in the first place if it does not provide any other value here?
You mention SPIRV here, but this is exactly the example I was referring to earlier: a spirv module can exists by itself and does not need an encapsulating “ModuleOp” or “CompilationUnitOp”.

Ah that’s the problem: though true in a lot of simple upstream cases not all module ops are getParentOp() == null. If you are trying to embed multiple spv.modules you need some container for them, and once you have that container - whatever it is - you want to be able to run passes on it. You can no longer rely on getParentOp() indicating that an spv.module is a top-level construct in the SPIRV dialect and it’s even worse with things like the LLVM dialect that just reuse ModuleOp, as then you can’t even identify whether a given module is an LLVM module without looking inside of it.

To make this concrete, here’s a walkthrough of how we ended up with what we did and why it’s not great even if it does work:

module {  // implicit
  my.container {
    spv.module { ... }  // obviously in the SPIRV dialect
  }
  my.container {
    module {  // what is it?
      llvm.func { ... }  // oh, llvm I guess
    }
  }
}

Now if one wanted to run a pass on the ops within the container (canonicalize/etc) nesting gets weird, especially if you consider how this looks above with how it would look when parsed on its own:

// my_spv_module.mlir
module {  // implicit
  spv.module { ... }
}
// my_llvm_module.mlir
module {  // implicit
  llvm.func { ... }
}

And now you would have two different types of nesting behavior between the above my.container wrapper vs the module that you get from the standalone files. In a game of whack-a-mole trying to make this stuff consistent we chose to use an implicit module wrapper as the container so that at least the interior contents would look the same as their loaded files:

module {  // implicit
  module {  // as with the implicit one created above
    spv.module { ... }
  }
  module {  // llvm's reuse of 'module'
    llvm.func { ... }
  }
}

But then if doing anything non-trivial you still need your own my.container type to carry across any other information (attrs, ops, etc) and you end up with something that is functional but horribly unergonomic:

module {  // implicit
  my.container {my attrs} {
    my.op
    module {  // as with the implicit one created above
      spv.module { ... }
    }
  }
  my.container {my attrs} {
    my.op
    module {  // llvm's reuse of 'module'
      llvm.func { ... }
    }
  }
}

So, if spv.module implemented a ModuleLike interface and llvm used its own module one could make this much simpler and author passes against the interface that nested appropriately:

module {  // implicit
  my.container {my attrs} {
    my.op
    spv.module { ... }
  }
  my.container {my attrs} {
    my.op
    llvm.module { ... }
  }
  my.container {my attrs} {
    my.op
    llvm.module @foo { target="x64" ... }  // can have multiple just fine, too
    llvm.module @bar { target="aarch64" ... }
  }
}

And if the parser used that ModuleLike interface being present on a root op it could remove the implicit module that is created when parsing files and make the behavior is consistent whether using opt/translate on a file or invoking passes on modules scoped within other modules:

// no implicit outer module
spv.module { ... }

The parser API let’s you decide how you want to parse it. The implicit module is only added when there are multiple operation next to each other in a file. This is a convenient but you don’t have to use this though. You can setup all of your tooling to always have a top-level iree.compilation_unit for example.
Of course if you write a textual file with multiple ops you’re breaking your own assumptions, but you tooling can just reject this.

I question the validity of ending up in this situation in the first place. If you have your own container and you can put a module in it without knowing what this module is about, then you’re not in a situation to really do anything with this module. I’d go back to the design itself and avoid getting these in the first place. I don’t see this as an issue with ModuleOp though.

Having an llvm.module can be justified, but your examples show also a spv.module nested under a ModuleOp and I don’t quite see why it has to be right now (and so you’d end up in the same situation with a llvm.module).

I think the meta point here is that there’s a bunch of inconsistency around this and there need not be. If you look in the SPIRV dialect, for example, it does a bit of load-bearing mlir::ModuleOp munging:

You could argue that it shouldn’t be doing that but it’s doing it for a reason: note in the last example that you can’t replace a top-level getParentOp() == null, or convert it (we have to deal with this in IREE), so this implicitly requires a parent op of some kind. The IREE uses are very similar to those above - and show that there is a requirement for something, and that having that something be ModuleOp is not great, especially when some dialects like LLVM reuse that for their own module-like behavior.

Making it less bad and more consistent - even if not yet great - is the goal :slight_smile:

I agree, but my read of this is that this isn’t inherent to anything related to ModuleOp itself, neither to an MLIR API right now: so I am still feeling I missed the point throughout your examples.
The only thing I got from this thread so far is that an LLVM module op may be useful, but it won’t solve most of what you showed above.

This looks like code that isn’t great (I don’t understand the design behind ConvertGpuLaunchFuncToVulkanLaunchFunc…), but I’m not seeing how renaming ModuleOp into CompilationUnit would solve anything here? I also don’t see how having a trait would help since this code really want a nested SPIRV module under, and this is a conceptual need since it deals with host code calling the device code as far as I can tell.

The deficiency of the MLIR API here today with respect to mlir::ModuleOp is that patterns and conversion cannot replace the top-level op and if you want to do so on some hypothetical ModuleLikeOp - whether mlir::ModuleOp or spv::ModuleOp you need to nest it within another mlir::ModuleOp. There are also issues with generic pass pipelines that need to nest things when they otherwise shouldn’t (this is one reason you see module { in IR where it otherwise is not needed - it’s a workaround for the parser being too clever and not creating the required mlir::ModuleOp needed to make things work!). So we’re not talking about a ModuleOp as a module op (“a container for functions and such”) but there as a INeedThisOpThatsCalledModuleButNotMyModuleOpForSomeArbitraryOrLegacyReasonOp. ModuleOp is the wrong name for it in those conditions and I think there’s something shorter than that which would suffice (RootOp, TopLevelOp, CompilationUnitOp etc etc more aptly describe what it’s there for) :slight_smile:

If the API didn’t have these restrictions having to wrap things in mlir::ModuleOp (or any dialect’s ops) - either in textual IR or in code - should never need to happen. Those are examples of where it does need to happen today because of the API. “Don’t use pattern rewriting or conversion on modules specifically - sometimes - and deal with having module { module { in your tests and IR and C++” reeks of a design issue, which is what this thread is talking about when it says that ModuleOp is privileged and shouldn’t be.

Improvements to the infra to allow for replacing the top-level op may negate some of the need for the mlir::ModuleOp to wrap things, but until it’s possible to use all of MLIR without mlir::ModuleOp it feels like there’s an issue with the API. Telling each dialect to make its own outer container op isn’t satisfactory either - then we are back to being unable to write passes that work across dialects on top-level operations without specifying those ops directly - and why even have module op at all? (which again comes back to the topic of this thread)

If it were not convention to call module-like things modules (so no spv.module/etc but spv.shader) it may be different, but because module is the generic name in MLIR for “symbol table single-region container” it’s just presenting unneeded aliasing. You could try to force everyone to not use “module” ops in their dialects but that’s pretty silly (“why does MLIR reserve this special name for a module op that I don’t use as a module op?”).

I started this thread and would still like to finish this patch some day but am going to be out for a few weeks and will have to catch up on the discussion later.

I think the meta point I would make – and it sounds like Ben/Mehdi agree on is that there is a lot of inconsistency in the APIs. I’ve seen enough code to classify into a) things that tried to bring their own module and encapsulate their world (and hit all kinds of weird edges and made pragmatic compromises to try to make it work), and b) things that probably should have had their own top level encapsulating operation but just used the built in ModuleOp (because they predated custom top levels or found them too hard to use).

I’d focus my energy on cleaning up the APIs to make them more consistent and usable in this area and not get too bogged down in the existing code – which, as near as I can tell can be improved in all cases with some better consistency. I’d also bike shed the naming of the built in top level a bit, because I think it squatting on the name leads to subpar designs in its own right.

My 2 cents.

There is a lot of things here, and I’m not sure how to untangle it all.

OK, this may be an API limitation (I haven’t checked), and we could look into lifting this restriction, but I’m not seeing the immediate connection with ModuleOp itself, or the point on having traits here? (this limitation applies to the top-level op, regardless of the name).

Actually you only need to nest it under “something” if the API is just limited to not allowing updates to the top-level op.

I don’t quite get what you mean here? The fact that a text file with two functions has a top-level module added implicitly? (but then you meant to write “the parser […] creating the required …” instead of “the parser […] not creating the required …”?).

For some historical bit: I originally wanted to require the module to be explicit in the test file, and forbid having multiple top-level operations (like 2 functions) being implicitly nested, but others in the team thought it was just unnecessary boilerplate that shouldn’t be required for the common case of writing tests. I’d be happy to revisit this though :slight_smile:
In any case, this is nothing more than a syntactic sugar and these isn’t anything fundamental here, so I may be missing the point (again, sorry).

But then it has nothing to do with module as I understand it: you point to the fact that pattern driver and dialect conversion infra can’t touch the top-level op (ModuleOp or any other). This is more structural (the root of the tree can’t be touched) than having to do with a trait/interface or a naming for ModuleOp.
Unless you’d want to workaround this API limitation by enforcing that the root of the IR is always TopLevelOp regardless? But then it is like going back to when we were forcing ModuleOp to be the top-level, and we instead lifted this to allow things like “having a top-level standalone SPIRV module” for example.

So far you mentioned limitation with the “top level op”, but where is mlir::ModuleOp required to build an MLIR based compiler otherwise?
You should be able to build your entire toolchain (including your own mlir-opt-like tool) without a ModuleOp ever be involved anywhere (you can’t use MlirOptMain as is, but this is just a convenience wrapper, it does not go deep).

Actually passes can be written to not require a particular op, and then be scheduled by anyone on their own top-level operation. The canonicalize pass is an example: you can use it on ModuleOp, on SPIRV Module, on LLVM Func, …

I think I’m trying to convey here that this is hard to hold and you’re telling me I’m holding it wrong and should hold it differently. I could hold it differently, but it doesn’t change the fact that it’s hard to hold :slight_smile: