Path 2: We can allow mlirTranslateMain to take an optional DialectRegistry , fill those once in in circt-translate.cpp (et al) and have those set up for all translations. Generic translations now have a single name and are registered in a single consistent way regardless of whether they are generic translations or not.
I see some important layering issues with path 2 right now:
One problem with this model is that translations behaves differently now depending on some external factors (the tool setup). Path 1 has the property that for a given translation registration (mlir-to-xml for example) then xxx-translate --mlir-to-xml has always the same behavior regardless of the tool. This is a property of *-opt tools in general: the tools control what they register but won’t change the meaning of a given registration.
The tools are registering multiple translations, but by adding some Registry or other options at the top-level of the tool instead of inside the registration, you’re actually affecting all of the translations even if they are unrelated. Path 1 keeps the translation registration isolated from each other (which is a critical aspect for testing!), and your patch is breaking this.
There is already a dialect registry provided by the translation registrations, but you added one of the top-level tool. They are quite redundant, there is no good way for a tool author to really know how to use one or another or how they interact with each other. This is also increasing the complexity of the system here. All in all: a quite good indication that there is something not right with the model right now.
Right: as commented on the patch, I believe that AllowUnregisteredDialect should be a single flag set by the translation registrations. It does not belong to the tool for the same kind of reasons I mentioned above.
Path 1 does give you that, but I’m not sure why that’s beneficial? We don’t even have that with *-opt tools, given that the behavior can change depending on the input. You can’t generally take take IR from one *-opt tool and feed it to another and expect it to behave exactly the same way. Different variants of mlir-opt may have different dialects/extensions/etc. registered that don’t exactly carry over. The passes are still technically the same, but you don’t necessarily get the same behavior. I don’t see what is different about testing translations (we don’t need N number of pass registrations for every combination of dialects).
Does it? We aren’t calling the translation registry (it’s just for the translate tool) in any kind of production environment, and in those cases the dialects registered/loaded will be different from the tool setup anyways. How does this affect testability? Registering dialects like we do in mlir-opt hasn’t affected the testability of passes AFAIK.
Seems like it could be reasonable, but yes it is only tangential here.
I see it as fairly important that the behavior is well defined and contained in one place: the registration for the translation. This is a matter of orthogonality and composability, I see it on the same scale of principles of design we always followed elsewhere actually, I’m not totally sure what is controversial here really.
The IR as input to a particular pass should behave the same way always. This got broken recently because of external registration and we considered this a bug that lead to the proposal about promises.
As mentioned at the very beginning of this thread: I see a fundamental difference between *-opt and *-translate: I believe that when we do side-by-side comparison, the role of translation registrations is best compared to defining a new *-opt tool than anything else. This is what I elaborated in the first post of this thread by explaining how each translation registration defines it own “subtool”.
How doesn’t it? Any of the thing that you put in the main() applies to all the registered translation: this seems to me to be fairly obviously breaking the isolation that path1 provides. I’m not sure what’s not clear to you here?
Again: you’re missing the fundamental point I made about translation being their own “subtools” (the DialectRegistry they populate being a clear sign of it). So the mlir-translate main() is nothing comparable to *-opt dialect registry to me.
That doesn’t cover the use case of downstream users attaching extensions that can’t be promised (i.e. the interface is not accessible to the dialect to promise it in the first place).
I guess the big point I’m missing is why we need to view it that way. The duplication that Chris mentions upthread seems very unfortunate for what is supposed to be a developer tool, to make testing easier for developers, and not something for production. We are making it more difficult, and the benefit just isn’t clear to me (not that I’m stating there is no benefit, I’m just missing it if there is).
Feel free to propose a rebuttal to my original post: I proposed my own reading of how I believe things are modeled right now, I might be wrong
I really believe that this isn’t just about a particular implementation choice that is just fungible, and instead this is fairly fundamental discussion about what the problem is and how to model it.
I guess I don’t see a significant difference about this either way: the amount of code to write seems trivial and tiny to me. Adding a custom registration for a developer use-case does not seem like a hurdle to setup: it seems like a reasonable one-time cost.
Also, the soundness of the model takes precedent to me in general. It seems like the same arguments faced in the past when we pushed things that were much more intruding and obnoxious. In particular, MLIR started with global constructor for registering everything, and I remember how it was seen important that developer got the ability for any project to setup their own *-opt tool with a simple build rule without writing any code.
We made it much harder when removing all these global pass registrations and forcing people to write a main() of their own and some boilerplate (much more than what’s involved in a translation registration actually). We then simplified some of the boilerplate, in particular with adding support in TableGen to define passes, but that came later and there are still missing pieces (createXXXPass() functions are still manually written and we still have this issue for better handling of options).
I see it 100% in the former category in line with the above. It was meant as a very simple tool to do one thing in isolation. Hooking it up into other things/stretching it doesn’t lead to nice ergonomics IMHO (yeah it’s useful as starter tool for a week or two maybe :-)). We could improve some of those (make it a bit more proper subtools, error reporting, CLI flag separation), but that goes away from the goal for me: making it simple to test translations simply.
Accepted input is a property of the tool still as the tool dictate what is registered. When one creates a translate tool, you set the dependencies and what is registered. Now it would be a failure if we did some random registrations by having libraries and targets that comingle things (e.g., the cases where this breaks down seem like bugs).
--mlir-to-xml only would differ wrt behavior in terms of accepting custom syntax or unregistered dialects. That doesn’t seem a core property of the translation to me. The translation only requires MLIR module (well some top-level construct) in memory, what dialects may be loaded is not relevant to it, that is the responsibility of the tool. Either the file is loaded or not, the translation happens if loaded and does so uniformly no matter which dialects are involved. So one would test this translation & have less unrelated noise in its registration (e.g., it shouldn’t care about dialects) and I’d still need to use the correct translate tool for a given input, vs needing additionally a different registration per tool which doesn’t seem to add value wrt testing/intended use (it is uniform behavior for same registration in different tools but that’s not a value add here, and it makes a peripheral behavior uniform). And I’d be very sad if we had a tool with circt-to-xml tf-to-xml tfg-to-xml options with these different translations not composing while doing nothing dialect specific during translation.
I see it actually being worse for orthogonality in the above case: it co-mingles requirements for loading with translation being tested.
Indeed, and it is still harder than what it was. I think the biggest issue that was resolved there was that we enabled folks to create tools without a global registry, or stray visible symbols, or inhibiting DCE’ing or … and it resulted in cleaning up build deps. So very useful. But that is seperatable from having a testing tool that uses a registry though.
I agree with formulation of these really being subtools from the point of view that there is very little structure/similarities between registrations. But I differ in how isolated I see them, basically what constitutes the environment they operate in and the core translation functionality being registered. And here in particular I see the loading into memory from file aspect to be separate from the transformation and part of the environment dictated by the tool (heck even definition of file you’d see in the original commit was assumed to be a function of the tool, only requirement was that it could be unambiguously named).
Seems like a play of word to me: you’re not fundamentally changing anything to what I wrote: the input is a property of the translation and/or its registration, the tool is just indirectly controlling it by enabling or not a translation to be registered.
Not only: dialect registry is loading interfaces as well.
Right: I don’t think “making it easy” should be the main metric to optimize, having the right model is more important in terms of software design to me, let’s not take shortcuts for superficial convenience.
I fail to see how it is the responsibility of the tool to load the input file into memory for --spirv-to-mlir for example.
Just this is a strong enough indication that having a DialectRegistry on the tool instead of the registration is absolutely not right.
Isn’t the main metric of this tool to let us have the minimal amount of binaries and linkage overhead for a project to be able to test translations between arbitrary stream based representations and corresponding MLIR? The translation registrations are just a way to “busybox” multiple such things together into one binary and give them a name.
I don’t think there should be much design or “framework” beyond that. Also, while we don’t often do it, there may be cases where I want two otherwise same translations that configure the context differently so that I can test (ie. With and without some dialects registered, or allowing/not unregistered dialects). We do get bugs on those axes but rarely go to the trouble of testing at that level today.
I don’t think that’s true? The difference here is the .mlir parser, the translation itself is the same. Further, MlirOptMain already takes a DialectRegistry as an argument, so it has the same behavior if the specific tool prepopulates it with domain specific dialects. This patch makes mlir-translate family tools comparable to mlir-opt.
Even if it were the case, I don’t see how the property you’re describing is actually useful. For example, if “mlir-opt -cse” and “circt-opt -cse” were the same, then “circt-opt -cse” would reject the pretty form of hw.module - that isn’t useful. I don’t see how translation tools are any different.
The only apparent difference between mlir-opt and mlir-translate is that we have a lot more optimization passes than translations in the community, and the muscle in mlir-translate hasn’t been built.
You are correct, it would be registering it for all translations in that tool, just like mlir-opt does for all passes. I agree with you that this is a change, but I don’t see what the problem is. If you want to test an MLIR translation thing without circt dialects then use mlir-translate, no one is taking that tool away.
I agree this is adding complexity by adding a “new thing”. I am trying to justify that with use cases that are underserved. I don’t think that the alternate path is actually better.
I don’t understand why you would think that. Can you elaborate?
It sounds like you have some very specific negative personal experience here, which I haven’t experienced so there may be some something I’m missing.
That said, your opininon seems to be that we should have a very highly opinionated solution and try to “prevent” people from making mistakes. I don’t see that this approach could work in general, and I don’t see why it is valuable on balance with the major usability problem we have now, and the code duplication issue you’re suggesting as a solution.
I think we can be open to different communities in MLIR having different ways to solve problems. It is perfectly valid for one community to have a policy of “not registering any dialects in the tool” because they want perfectly isolated translations and use that as a way to ensure that. It seems also very valid for communities to go the other way and say “hey I want to be able to use a common set of dialects because … I write all my tests using them”.
It is also perfectly valid to have both in one project. You could have two tools: a “circt-translate-minimal” and “circt-translate-useful-for-writing-tests-against”. You could even have a “circt-translate -extra-dialects” flag and only conditionally register the dialects.
We aren’t taking away anything in the design space here.
I strongly object to this analogy though, as I explained it quite a few times in this thread already. I believe conflating *-translate with *-opt is ignoring fundamental differences.
As Stella mentioned it: mlir-translate is like “busy box” in how it dispatches to the translation. mlir-opt tools don’t have this behavior: instead in terms of comparison I see each of the “busy box” entry of mlir-translate more comparable to mlir-opt.
The problem is that it is not desirable that the way we provide control to impact one translation impacts them all.
Because in my own circt-translate, but also in mlir-translate, if I find the need to “allow-unregistered-dialects” by default on a subset of the translations, I shouldn’t have to enable it for all the translations. However that is what your API change forces the user into.
With your change the only way to segregate the behavior is to link multiple different tools. This breaks entirely the “busy box” dispatch.
It depends what the alternate path is: I understand what you’re trying to achieve, but I see your change as a quite disruptive with respect to the model right now.
You justified it with a hypothetical “generic translation”, we don’t have one upstream today so I’ll continue with this fictional use-case. I believe we could achieve similar behavior to what you’re describing in different ways. So let’s look at a hypothetical mlir-to-xml translation. The way we implement our current translation look like this:
void registerMlirToXMLTranslation() {
TranslateFromMLIRRegistration registration(
"mlir-to-xml",
[](ModuleOp module, raw_ostream &output) {
auto XML = translateModuleToXML(module);
if (!XML) return failure();
XML->print(output);
return success();
},
[](DialectRegistry ®istry) {
// what to put there?
// Maybe: `registry.allowUnregisteredDialect();` per Stella's suggestion?
// Or `registerAllDialects();` ? But then it forces everyone to link and enable all the upstream dialects...
});
}
So after your patch, the solution for enabling this in mlir-translate upstream would be to change its main() to register all the dialects and also allow unregistered dialects. But unfortunately that would apply to all the translations we have upstream
Edit: I think this is a critical point to answer, do we think it is just fine the day we add a “generic translation” upstream that mlir-translate will then just always registerAllDialects() and allow unregistered dialects for all upstream translations? How isn’t this making the DialectRegistry in the individual translation totally obsolete?
So I’d argue that a modeling that respects better the current layering would rather be to change the registration itself to be injectable:
// Register the `mlir-to-xml` translation **with a specific registry!**
void registerMlirToXMLTranslation(DialectRegistry registry) {
TranslateFromMLIRRegistration registration(
"mlir-to-xml",
[](ModuleOp module, raw_ostream &output) {
auto XML = translateModuleToXML(module);
if (!XML) return failure();
XML->print(output);
return success();
},
[registry](DialectRegistry ®) {
registry.appendTo(reg);
});
}
I believe that this still provides the feature we would want in terms of end-user, but it would preserve the isolation between each translation and allow to control them independently, so that for example controlling the behavior of mlir-translate --mlir-to-xml to accept unregistered dialects (or some specific dialects) does not “contaminate” the other translations (mlir-translate --llvmir-to-mlir for example).
Well, in the shoes of a user: if I implement --mlirfoo-to-bar in my translate tool: do I add my FooDialect in my tool’s main() or in the translation registration? Both achieve the same result: what’s the tradeoff between the two?
But also, beyond this, let’s say I have my own baz-translate tools and I want to experiment the generic --mlir-to-xml. If I use your new API to inject my FooDialect in the registry from main, this is what you intended it for I believe. However this leaks the FooDialect into all the translations.
Then later someone in the project/team implements a new translation: --mlirfoo-to-bar. They don’t add anything to the DialectRegistry in the translation registration API. They don’t have to think about it and won’t notice: it’ll work out of the box because of the “global context of the tool” which you created here.
What if I then want to cleanup our baz-translate because we don’t use --mlir-to-xml. I would remove my code and try to remove this “global registration of FooDialect” from the baz-translatemain(), because really it was added there to support this “generic translation”. However because it leaked in to this “global context”, we now implicitly depend on it from the --mlirfoo-to-bar translation and it’ll break.
So: this is breaking isolation between what should be “busy box” dispatches, creates a “global context” (uh…), and provides opportunities for added couplings between things that really shouldn’t be coupled.
To me in terms of design, proceeding forward with such properties requires cautious and comes with a fairly high bar with respect to alternative implementations as well as the pursued goal.
Just a thought: might the two of you (Chris and Mehdi) benefit from a private conversation about this? I’m a little bit surprised at the depth of passion being applied to this specific topic and knowing you both pretty well, I bet a f2f exchange would be more productive. We’ve got a lot bigger things to align on and I’m worried that you both waste your amazing brains on a topic of relatively minor importance
Mehdi and I had a chance to catch up, he’s totally right about the parameterized registration approach, I’ll give that a try. I’ll also look into adding the “allowUnregisteredDialect” bit to DialectRegistry and see what that ends up looking like.
Getting back to the original topic, maybe someone can update the docs with a crisp statement about what the *-translate tools are and aren’t so we can avoid having this discussion again
Had been meaning to do that for a while so we could fix some flag ergonomics better and (ultimately) exclude some things that are more in the developer tools category.
I have some experience with mlir-translate in two different contexts:
Conventional approach loading a file, translating its contents to MLIR
This was as straightforward as it sounds.
Bridge an existing graph compiler’s (not LLVM/MLIR-based) internal graph DS to MLIR.
This was an important use case where we wanted to take an existing project and transition it to MLIR - the graph was the chosen point to port to MLIR (from where we were able to quickly implement legalizations to TOSA as well).
The tricky part here was passing the graph thru llvm::sourceMgr at the TranslateToMLIRRegistration - we didn’t want to parse the input a second time, but instead translate the graph → MLIR which was a simpler exercise that I wrote too. After initially having our own binary for this, the better option was to have the sourceMgr path as the file reader and to invoke the graph builder there, then map to MLIR.
This also reminds me of a related issue with serialization registration - the existing infrastructure is fine for the purposes of a self-contained serialized representation, but an external form (e.g. flatbuffers) requires a conditional build rule if its part of a core dialect, to avoid the project-wide external dependency.
Just to confirm, Mehdi’s parameterized registration approach:
works great in practice, directly solving the problem I had. Thank you again Mehdi! I’ll take a look at putting “allow unregistered dialects” bit into DialectRegistry and prepare a phab to review.
IMO, it is primarily a compiler writer’s tool and even then only really useful for writing tests (for which it is really really useful!). Practical user tools will have a bunch of other considerations and don’t necessarily want to use the translation registry at all.