[RFC] Canonical file paths to dialects

The dialect naming and file layout situation in MLIR looks quite messy, making it non-intuitive to correlate between dialect names, eventual namespaces and repository paths. We had a brief discussion before MLIR was integrated into LLVM - https://groups.google.com/a/tensorflow.org/d/msg/mlir/JzR6EAVOtYI/I5Y5O93lAgAJ - but we did not follow through to avoid messing up the integration. And we now also have more stakeholders, so reiterating the proposal here.

  • The “X” dialect lives, with respect to the monorepo, in mlir/include/mlir/Dialect/X for public headers, mlir/lib/Dialect/X for sources, mlir/test/Dialect/X for tests.
  • The “X” dialect is defined in XDialect class in C++, optionally in namespace x. This means, for example that the LLVM dialect should either be moved under mlir/lib/Dialect/LLVM and equivalents, or its C++ class should be renamed to LLVMIRDialect.
  • Dialects are not suffixed with “Ops”, e.g. “AffineOps” should be renamed to “Affine”. The suffix convention dates back to the times where (a) dialects were placed in the same level as other libraries (mlir/lib/X) and the suffix was useful to differentiate them; (b) MLIR had no dialect-specific types or attributes, so a dialect was equivalent to a set of ops.
  • Individual files inside each dialect may still be suffixed with “Ops” provided they are only related to the Ops of the dialect; e.g. SPIRVOps.cpp only contains op definitions while types are located in SPIRVTypes.cpp.
  • Conversions from “X” to “Y” live in mlir/include/mlir/Conversion/XToY, mlir/lib/Convresion/XToY and mlir/test/Conversion/XToY, respectively.
  • Default file names for conversion should omit “Convert” from their name, e.g. lib/VectorToLLVM/VectorToLLVM.cpp.
  • Conversion passes should live separately from conversions themselves for convenience of users that only care about a pass and not about its implementation with patterns or other infrastructure. For example include/mlir/VectorToLLVM/VectorToLLVMPass.h.
  • Common conversion functionality from or to dialect “X” that does not belong to the dialect definition can be located in mlir/lib/Conversion/XCommon, for example mlir/lib/Conversion/GPUCommon.
1 Like

All sounds great to me. Thanks Alex!

I think this is great, but maybe it could wait until after the other refactoring of the CMakeFiles? (assuming there are no objections to https://reviews.llvm.org/D74864)

Sure, there’s absolutely no rush here.

+1 for consistency! Thanks Alex!

I also see that we have LLVM IR code scattered in the Target/ directory. I’m wondering whether I should put SPIR-V (de)serialization logic there too. But I forgot the criteria we’ve used to decide the split and what that “Target” means. Can you clarify?

The criterion I used for Target/ is: mlir-translate can depend on libraries in Target/, mlir-opt cannot. Ideally, only Target/LLVMIR should depend on LLVMIR and LLVMParser libraries, but this is currently blocked by LLVM types (I’m working on it in the background).

This would be great to do! Some of the inconsistent nomenclature that we currently have across dialects is making it harder to navigate / quickly find things. Another thing to add to this list is to move some of the transforms and analyses from lib/Transforms/, lib/Analyses/ into the respective dialect sub-trees.

+1. We’re doing that in IREE and it’s working out really well.

Example: https://github.com/google/iree/tree/master/iree/compiler/Dialect/Flow

1 Like

I’d add that conversion libraries should be consistently named. Probably MLIRXToY. In some cases (conversions to LLVM) the name does not match the dialect name (LLVMIR) and in other cases they are named “MLIRXtoYTransforms”

Indeed.

One question I have here is the structure replication. Should we always create lib/Dialect/X/IR/ even if the dialect does not have (and/or does not intend to have) transformations and analysis?

Good point!

I started capturing this here: https://reviews.llvm.org/D75762

I’d also like to suggest that Dialects have a canonical header file mlir/include/mlir/Dialect/Foo/Foo.h, which includes other headers necessary to work with the dialect. In particular, I don’t think we should require people to include the ops header, the EDSC headers, analysis headers, and transform headers individually. It should be enough to simply depend “on the dialect”.

The CMake refactoring has all landed at this point. Feel free to move forward.

But won’t that just increase build times everywhere due to unnecessary header includes? For eg., analysis stuff that depends on the dialect’s ops won’t depend on the latter’s transforms; ops of a dialect that depend on ‘ops of another dialect’ don’t depend on the latter’s analysis or transforms.

The thing is that “work with the dialect” is a bit fuzzy: you may not need passes/analysis/… just to access the ops. If you look at LLVM, it seems that what you propose would be the equivalent of a header pulling it all of include/llvm/IR, include/llvm/Analysis, include/llvm/Transforms/, …
Pushing the concept, isn’t it like having a mlir.h header which includes everything and can be included everywhere?

I suspect that at the scale we’re talking about any build time increase would be nominal, although I haven’t done experiments to back this up. Depending on any dialect is going to result in a dependency on the core IR, which is probably at least as significant as bringing in resulting Analysis or transformations. I don’t see individual compiles being a bottleneck, as much as other dependencies which result in a significantly sequential build, but people without lots of cores might feel differently.

I think the difference is that many dialects can be small. Dealing with ‘dialect dependencies’ is a manageable complexity, IMO, whereas requiring users to include multiple headers files for each dialect which might all be different is much less manageable. As far as an “mlir.h” header, I don’t think it’s such an outlandish idea. I suspect that many external users of MLIR will reuse lots of the standard dialects and might add one or two. including MLIRAllDialects.h, and linking against libMLIR.so would be a reasonable path forward, I think. My view is that the standard dialects are more of a well-crafted holistic experience, where the dialects naturally complement eachother and are often used together, rather than a hodgepodge where users pick and choose one or two dialects and discard the rest.

I’m sorry I simply can’t see how one can justify adding unnecessary includes and dependences from ops to analysis/, from ops to transforms/, from analysis to transforms, both across dialects and within a dialect. The analysis/ and transforms/ libraries and theirs headers could be quite substantial for several dialects (for eg. affine/), and would keep growing for other dialects over time. What you propose would just be bad for build/link time cycles, compiler dev productivity, turnaround time, CPU, main memory, and power/energy usage — and given how widely a compiler infra like this is compiled/linked all over the world, a change like this is just bad for a green planet! :slight_smile: The cost of including the right headers depending on what you are using from analysis/transforms if at all is relatively nothing in comparison IMO.

I’m looking at some CMake library rule names some time later (in the context of reconciling them with the existing names in Google’s Bazel build rules), and still seeing some trailing “Ops” suffixes. Is the goal still to rename these? In particular, we’ve still got CMake libraries MLIRAffineOps, MLIRLinalgOps, and MLIRStandardOps. Additionally the directory for Standard is still called StandardOps.

I think the goal is to continue renaming. Patches welcome :slight_smile: