As far as I can tell the top-level op being builtin.module isn’t actually a requirement of the core IR though. This behaviour was surprising when defining my own module-like op, e.g. module { ... } and my.module { ... } end up with different levels of nesting, since the latter gets a top-level builtin.module op inserted, and now running --my-generic-module-pass will run on builtin.module instead of my.module.
The main places I found builtin.module to be baked in:
Using parseSourceFile<ModuleOp> in mlir-* tools
The python ir.Module class requires the top-level op to be builtin.module
The python and C api’s only support creating pass managers on builtin.module
I started poking at this here by adding an interface for module-like ops + a parse API that only inserts a builtin.module if the top-level op isn’t module-like, but since this seems like a potentially disruptive change I wanted to see what the desired behaviour here actually is before going too deep down the rabbit hole. Is it actually beneficial to generalize all these to support more than just builtin.module? And if so what should the mechanism be for determining what ops are allowed to be top-level? (opt-in through a trait/interface? let all IsolatedFromAbove ops be top-level? something else?)
Nice! I did most of the work to enable this earlier this year, but haven’t had time to push further. The biggest requirement for me here is “unsurprising and unambiguous behavior”.
These still use builtin.module mostly because I don’t develop/touch python stuff, patches would be welcome to clean this up.
An inherent problem of using something like an interface/trait/etc. by itself for this is what you do when the dialect of your “module” is unregistered. If you tried to load your module and your module dialect wasn’t registered, you’d get different IR than if it was registered (which isn’t great given that it breaks roundtripping).
Hm, I hadn’t thought much about the unregistered case. Some options I see:
Allow all ops to be the top-level ops, and only insert builtin.module when parsing IR with multiple top-level ops.
this seems like the least surprising behaviour (for someone not already accustomed to how things currently work), but probably the most disruptive. e.g. this will just silently stop doing anything:
Make it a parse error to try parsing IR with a single unregistered top-level op, since the builtin.module-insertion is ambiguous
seems the least disruptive
I think currently all cases go the other way, where the registered version of an op may be rejected but the unregistered would be allowed. And the rejection seems to always come during verification, not parsing.
not the most satisfying solution, but I wonder if there’s a “real” use-case this would break
One more option, which may be more consistent with how we handled this in general: stop the implicit adding of builtin.module when parsing, error out when the top-level op isn’t implementing the trait, unless it is an unregistered dialect (and unregistered dialect are explicitly allowed).
I would like to rework the C and Python APIs to not be hard-coded to ModuleOp. I won’t have time to look at it this week, but if no one else gets to it, I can probably propose something next week. This will be an API break but is worth it imo (and I believe is the last bit of egregious coupling like this).
I like the idea of getting rid of the module-insertion entirely, I think this is really the simplest and least-surprising behaviour possible. So a concrete proposal in that direction:
remove the implicit module-insertion from all tools
allow any op to be parsed as top-level; no new trait added
it’s a parse error to have 0 or >1 top-level ops
staged transition (add options in tools to enable new behaviour that’s defaulted to off, transition an initial set of tests, flip the default and let code owners update the rest of the tests, eventually remove the option)
(As a data point, updating mlir-opt to this behaviour and running check-mlir locally gives me 511 failed, 939 passed, 257 unsupported tests.)
If we’d like to support multiple top-level ops, we could change the tools to use Block as the top-level “thing” instead of Operation. Not sure how valuable that is though, since you can use --split-input-file in tests to get the same sort of behaviour.
With this change, would it still be possible to have an IR with more than one module op in it? Will this basically restrict it to one module per file, or will there be a way to tell the parser which module op you want to parse?
I ask because I know there are cases (maybe with GPU dialect?) where there are two modules in one IR file and each module gets lowered to a different device.
How about just adding a flag -no-implicit-top-level-module that is looked at by the parser?
It’s good, canonical, and convenient to always have a top-level builtin.module op (which implements SymbolTable) and to have it added implicitly so that test cases can be written compactly without having to put a module {...} all over the place – especially for all the passes/utilities that are pre-conditioned on finding an enclosing op that holds a symbol table.
I definitely sympathize with the convenience argument. Plus the extra indentation would cause a lot of churn to update existing tests. I don’t think it’s good as a default behaviour though, it imposes a “canonical” structure that’s not actually required by the IR, and may not make sense/be required in all contexts.
What do you think about making “no implicit module” the default, but leaving “implicit module” around as an option in at least mlir-opt? Then there would be no need to do mass updates of tests.
That is true for any canonical structure – it is something not required by the IR but still good to have. I don’t see how that can be an argument for dropping a useful shorthand for a vastly common scenario.
This doesn’t help - won’t you have to go around and add -implict-module all over the place, and more worryingly, have to use -implicit-module on your command line for all local manual testing, debugging, and experimentation? (One would just commonly start their MLIR with func.func { ... }.)
So, I’m still strongly against what @mehdi_amini is proposing – there isn’t really any consistency it brings in but removes a useful shortcut.
It actually does bring a consistency, one that is lacking in many aspect of our current parsing logic. That is because there are two conflicting philosophies on this:
IR files are here for developers to play with and experiment. They are supposed to be convenient to manipulate and the default settings are more akin to a “DSL” to interact with the compiler than a robust way to have a representation of the IR.
IR files should be isomorphic to the in-memory representation because they are used for testing, reproducer, etc. The convenience of saving 10 characters is negligible compared keeping things isomorphic here.
There are other “defects” in our current system from this point of view (and I would change the default on all of them), for example we implicitly add source locations when not present in the input file.
(Not about parsing, but we have other issues on printing: not including locations by default and not preserving use-list orders are also things I considered important “defects” of the system today, in this philosophy of course).
I’d also differentiate what the opt tool does here: it’s meant for testing and exploration (potentially the former part is a bit too much ingrained). The convenience is towards making it easier to write tests. Having that in a testing tool for the convenience of tests authors I think can be separated from folks in the more DSL usage camp. Then the question is, what is the cost on the parser (not the testing tool) to enable a testing tool to present this convenience to test authors?
I think we may diverge on what are the good properties of “writing a test”: to me the most important part of the tool is to be unambiguous and “hard to misuse”, and only then “easy to use”.
So I’m looking for a tool to be first and foremost providing the “robustness” aspects of the process, while then trying to enable as much of convenience as possible without sacrificing any of the “hardened” properties (through options to override the “safe” default for example).
In many ways this is the approach I applied to various aspect of MLIR actually (removing global constructors in favor of explicit registration was also a controversial approach because of the “convenience” of global constructors, similarly for dependent dialects in passes and other mechanisms that we have).
As an aside from the current topic title: I believe that not printing location is a major flaw in our testing, it leads to location never been tested or seen in practice. We set a bad initial example by not having location in our tests consistently, they are too easy to ignore right now.
Having a component that makes use of both custom and nested modules, I agree with Mehdi: the current situation is quite bad and there is a lot of code both in the parser, tools, and downstreams that need to handle this which h is twisty and has been a maintenance tax for us.
There is also a chicken/egg situation where because it is hard to use custom top levels, we don’t use them much or support them well (there are various places that still privilege this, some of which have been pointed out here).
But I also agree with Uday: it is a minority case and the tests are significantly nicer for things that don’t care about top level to elide it. At a minimum, stylistically, when writing tests with an explicit top level, I’d like to maybe treat it like a c++ namespace and not indent (as a convention).
If we had a -parse-implicit-module, to mlir-opt, could the parser logic be simplified and made non ambiguous? I believe that currently it does some conditional checks to see if what was parsed is a builtin.module and I’ve found this to be fragile in the past. If it could be implemented in an easy/unambiguous way, that doesn’t seem so bad for the long run? (I’d want to think more about the transition plan – it’d be best if what we did only impacted things that have jumped through hoops to make the current situation work with custom top levels)
Mehdi, I think the flaw you point out w.r.t. implicitly adding source locations isn’t a bug in our policy or implementation of the parser, the problem I see is that the textual form of MLIR is considered primary because there was no other choice. The textual form is flawed for other reasons as well, e.g. you can’t parse the file unless you have the right registered ops, it doesn’t maintain the order of use-def chains, etc.
IMO, we’ve gone for too long without a binary format, whose job it is to faithfully represent the IR in a round-trippable way. As that is phasing in, I think we should recommend /it/ as the canonical form for stable round tripping, just like LLVM does.
(incidentally, I really dislike the implicit-module-at-the-top-level thing, I think that is a bug)
I agree that binary is the best “faithful” round-trip solution, but it isn’t enough: this won’t solve the problem related to testing I mentioned (we won’t write our test in binary format). So our testing will still be flawed in the same way, I don’t see a path to “fix” our testing for location for example (other than social pressure, but it hasn’t worked so far and we’re talking technical setup here).
(also LLVM textual format should be “faithful”, there is even support for def-use list order preservation)
The original idea was to allow ops to opt-in to being top-level through a trait/interface, with the current implicit-module-op-insertion as a fallback. This would’ve been the easiest to transition to, since existing tests/workflows would be unchanged, however it had the downside that the parsed IR could differ based on whether the top-level op is registered on not.
Another idea was to simply remove the implicit module entirely. This would get rid of any ambiguous cases and simplify the parsing logic, however the implicit module is very convenient for common testing scenarios, and removing it would cause a large amount of churn in tests. There wasn’t any opposition to keeping both implicit-module and no-implicit-module around as options, but a main sticking point is whether the default behaviour should be changed.
There were some other ideas somewhat in-between the previous two, but they would complicate the top-level-op logic further and there didn’t seem to be much interest in them.
To make some progress here I’d like to start by adding a --no-implicit-module option to the tools, while we figure out if/how to change the default behaviour. Initial set of changes here: