Emitting structured JSON for MLIR Dialect using TableGen

It seems there are at least two flavors of TableGen in the LLVM project: mlir-tablegen and llvm-tablegen. One difference I’ve noticed between the two is the former, which is used to generate the various dialect *.inc files, doesn’t seem to have an option to dump structured JSON. What is the reason these two are separate, and what would be the challenge of adding a structured JSON output to mlir-tablegen?

As a follow up, how do folks feel about emitting this JSON when installing a target (maybe to a path like etc)? This would allow language bindings to reference those files verbatim and write code generators in the host language instead of dealing with creating a TableGen backend. Optionally, we can hide this behind a MLIR_DIALECTS_EMIT_STRUCTURED_JSON cmake flag.

TableGen is data format with different backends. The backends are fairly separate and thats why we need multiple tools here (as we won’t include MLIR-specific backends in LLVM’s). To also enable the JSON backend in mlir-tablegen, one can add the JSON backend (llvm/lib/TableGen/JSONBackend.cpp), but it would not give anything more than the llvm-tablegen one, so one could also just use llvm-tablegen directly there.

I don’t understand the language binding and code generator parts - the TableGen output would be just the data, some structure is added by the MLIR one that wouldn’t be captured in the general output though.

So if I am understanding correctly, foo-tablegen is just a collection of backends related to foo.

Can someone with experience on the MLIR side of things comment about if there is anything that wouldn’t be captured in llvm-tablegen --dump-json? Also, is there a notable downside to basing language bindings off of the dumped json?

I think that --dump-json should capture all the information if done right. Basically, you just want a JSON serialization of RecordKeeper.

See this comment for more info about TableGen backends. tl;dr, it’s just a function that takes a RecordKeeper and emits some string based on it. So if you serialize the RecordKeeper right, then you have everything you need.

This is similar to say that we could serialize the MLIR generic IR to JSON though: while it is a valid serialization format, there is a bunch of semantics missing that is associated with the record. In TableGen you basically get the “parsed form” but it is actually coupled to the ODS backend from a semantics point of view.
For example an operation registered with SingleBlockImplicitTerminator will automatically have a block and a terminator added on parsing/creation. Many verifiers are associated with the traits/interfaces, etc.
All of this is hard-coded in the ODS C++ backend for TableGen.

Yeah. I’m not sure what you would use the “raw” JSON dump for, but it’s possible in theory.

I suspect that something more useful is something like the interfaces that @ftynse is working on for the Python bindings.

The JSON would have the same coverage in terms of the data, but it would just be raw info. E.g., if you have SameOperandsAndResultType trait, then it would be in the dumped JSON but this would not tell you that the builders that would be generated are different than if you didn’t have that trait (this one is simple to duplicate, duplicating the basic equality constraint one in there would need to be a plain copy). I think you’d end up needing to duplicate a lot of the same logic & assume more stability than we have at the moment. So you’d end up with JSON tooling ~duplicating a ODS backends but starting from JSON rather than TableGen as input format.

It does very much depend on what you want to do with it, if the information about which builders are created, constraints etc are not needed and you only want operands, results and regions, then the “flattened” JSON form would be fine. If you wanted to take advantage of any semantics added by backend then not as easy (hooking into the C++ “wrapper” classes may help more).

Inferring from the other posts by @GeorgeL, I would guess that the goal is to generate dialect-specific Swift bindings and write the generator using Swift rather than C++ :slight_smile: Virtually any language can consume JSON, but Tablegen format is C++ only.

Like others mentioned, we abstract away the raw table data with semantically-charged classes. For example, the generator is supposed to call op.getArgument(i).isOptional() instead of cast<DefInit *>(opRecord->getValueAsDag("arguments")->getArg(i))->isSubclassOf("Optional");. There are exceptions to these, notably custom builders, because we haven’t had time to clean that up. If we were to write ODS backends in a different language than C++, I would consider providing C API to the ODS backend first to avoid duplicating the semantics.

Duplicating semantics sounds like a recipe for pain (the optional example is a great one @ftynse) though I’m not sure a C API to TableGen would be particularly valuable. As an alternative it may be interesting to produce MLIR dialect specific JSON, but I think I need to develop my use case a little further before I get there.

I suppose one can encode “semantic” information in JSON, e.g. have “optional” as a field. However, this will soon become a parallel representation that will have to be maintained and updated every time we modify or augment the semantics (in addition to the API modification).

Yes, it would need to be maintained, but the alternative is that N language binding generators written in C++ need to be maintained. I’m also a big fan of having the “last mile” of bindings written in the host language, so that if those bindings need to be updated (for instance, in response to some new language feature) they can be updated by someone from that ecosystem without the overhead of them having to jump into TableGen code.

And by this you mean it currently only has a C++ parser and backends written in?

I feel like a lot of places where ODS is used, it is pretty tied to C++, so abstracting over that I don’t think would be trivial. But this might be a good trial. As your use case comes closer it would be great to evaluate.