[Aborted RFC] Allowing control of what backend/architecture-handling code is built when building MLIR

I’ve started digging around the static/fat library builds of GitHub - ROCmSoftwarePlatform/rocMLIR , trying to slim them down (since a debug build ran into a 4 GB library size limit on Windows, and while there’s a workaround, I’m still trying to solve the underlying questions).

I’ve managed to prevent a lot of dialect libraries from being built by default already, but I noticed that some parts of the core code, such as ConvertVectorToLLVM or the LLVM translator, will always include certain hardware- or backend-specific libraries and that there’s no way to disable that. For example, even though rocMLIR will never be generating ARM code or using Intel’s AMX, the code to handle those operations is always built and then only disabled at runtime.

LLVM, by contrast, has LLVM_ENABLE_BACKENDS, where, if I want, I can list out exactly the architectures I want to target and thus avoid of building large portions of code I know I’ll never need.

I propose that, when it comes to backend-specific dialects or code, we should have a MLIR_ENABLE_BACKENDS value in CMake. It should, as in LLVM, default to all (aka every available backend) but it should be overridable in order to reduce bloat in libMLIR (or something that’s including parts of it).

The upside to this is that certain parts of the MLIR build will become more configurable - that is, I will be able to, for example, only include the LLVM, GPU, and ROCDL dialect translations I want when I pull in “LLVM Translation”.

The downside is that this’ll clutter up certain parts of the build with if ()s and #ifdefs.

My initial thoughts for the values in MLIR_ENABLE_BACKENDS are

  • AMDGPU/ROCDL
  • NVGPU/NVVM
  • X86 (which’ll include Amx as needed, though someone might want to split that out later)
  • OpenACC
  • OpenMP
  • Arm (standing for all the various Arm* dialects’ usages)

The output targets themselves (LLVMIR/SPIRV/…) don’t need this treatment, since they’re already independent components you can choose to include or exclude from your build.

There are two different things here to untangle I believe:

  1. what libraries are configured / built
  2. what ends up in a given pass or tool.

MLIR is supposed to be decoupled enough that the dependencies you mention are all optional. For example ConvertVectorToLLVM does not have dependencies on ARM or AMX, even though the LowerVectorToLLVMPass does. The intent is that a downstream compiler built with MLIR targeting only one platform would not use this pass directly, but would use the right populateXXXXConversionPatterns to achieve what’s needed with the minimal amount of dependencies.

Similarly for LLVM translation, the ARM Neon translation is supposed to be available only if registerArmNeonDialectTranslation is called, otherwise it won’t bloat the binary.

Contrary to LLVM, all the registration is explicit in MLIR and this is why we don’t need to have configuration flag the same way.
We could add options to make MLIR upstream configurable, but right now I think it need some more justification: is this useful for testing MLIR itself?

My instinctive thoughts here are that it feels error-prone and wasteful to duplicate the big long function that sets up the vector to LLVM pass, especially when we’re already directly using upstream’s GPUToROCDL (which sets up a big, long *-to-llvm pipeline, so we’d have to copy that out too) and then miss out on changes to an upstream component that we, fundamentally, want to use and contribute to.

We already, more or less, have the flexibility to include just the parts of the upstream setup that we need, by way of a custom registration function, only depending on passes we’ll actually use, and so on … except that there are some cases, like the VectorToLLVM pass or the translation infrastructure, which are both generally useful and will implicitly pull every target they’re trying to support.

So what I’m trying to accomplish here is to thread that ability to take just the parts of upstream that I’m actually using into the few places it hasn’t touched yet. Or, equivalently, I want the ability to pass AMX=false at build time, not runtime, the way I can just not build the SPIR-V dialect.

And “supposed to” … well, if it builds, and I then roll up all the libraries into librockCompiler.a, then it’s there, potentially bloating the code, no?

Unless linkers tend to eliminate dead code from static libraries in a way I’ve somehow missed, which is a distinct possibility.

Sure, but a possible solution here is to refactor this and allow injection of extra pattern, and make this a library entry point. The pass is only here to test library entry points…

Can you clarify what part of the translation infrastructure implicitly pull extra targets?

I thought I’d seen something, but, now that I’ve taken another look, it seems I’d misread the CMake and also found a bug in our build.

Thanks for the poke on that!

And, having stared more at the rest of the build, it looks like the GPU translation pass is using the populateVectorToLLVMConversionPatterns() endpoint.

However, the conversion library MLIRVectorToLLVM unconditionally bundles the pass into the library, which is a much more straightforward refactor (or at least one that doesn’t require an RFC).

Closing as “there’s not actually an issue here, I just misread some things”

Re the vector-to-llvm decoupling, I now have ⚙ D158287 [mlir] Split up VectorToLLVM pass up to do that.