Installing subset of components

We have a base image that I’d like to install a subset of MLIR into (we don’t need most dialects, and certainly not most of LLVM). It looks like ninja install installs a huge pile of stuff. Do we have a knob to only builds/installs a subset?

The best I’ve been able to find is

cmake -G Ninja ../llvm \
    -DLLVM_ENABLE_PROJECTS=mlir -DLLVM_TARGETS_TO_BUILD=""

which at least strips down the targets a bit.

I guess the alternative is to just manually build a subset and manually stitch together the generated headers and the source headers and stuff? Anybody have experience with this or can provide ideas on how to proceed?

Can you say more about the subset?

It’s long past time we didn’t just make these divisions in install components, but the project needs to be physically split. Core infra and llvm lowering need some separation between them just to make the (future) dependencies a dag. Support needs to be it’s own thing so as to not pull llvm along with it.

The dialects directory needs to be taken out of MLIR minimally. In reality, there are multiple discrete projects in there which should be claimed and become their own thing.

That latter part will be a lot of work. We should come up with a plan that doesn’t require full sorting of that before doing the former.

@clattner and I have been discussing this off thread, as it seems like multiple integrators are hitting the bounds of the current setup and the project organization needs a rethink (not just a re-sort).

To answer your question, I am not aware of install time separation that makes the situation better. Sophisticated users are doing a variety of things to manage the sprawl.

This would be really great for composing downstream dialects, if we can make them “linkable” as a build option. Up-to-down as well as down-to-down (ex. third-party sub modules on downstream projects using LLVM).

Yup.

Technically we already can build more than what we need and take smaller libraries and link them together after the build in a downstream project, but this is fragile, since the separation isn’t conscious and could change over time.

A big worry is explosion of CMake flags and how hard it is to test them in all combinations. Not only when building LLVM, but also when using LLVM’s CMake files on other projects.

I don’t know CMake well enough, but if we make dialects, libraries and tools more fine grained, we could try to only build the libraries that have been linked to executables (via llvm_target like CMake functions)?

Would this work if the LLVM build doesn’t build an (unused) library X but when the project K requires that library? Would CMake be able to “continue LLVM’s build” on demand?

What would be ideal for me was just the builtin dialect and core IR. At some point some connections with more frontendey stuff may be useful, but it’s not an area of immediate focus.

To give a bit more flavor, what I’d like to do is to build core MLIR and then bake it into our base image so that developers don’t need to keep re-building it. And then from our code repo we take it as a pure external dependency.

Any tips to manage it? I think we do a pretty good job of keeping builtin and IR layered out, so I think in principle it’s not hard to just build the right subset of targets and manually stitch things, but if there’s any more walked paths here I would love to follow suit.

This only works if the LLVM dependency doesn’t change, and if that’s true, then you can always reuse the same build (install/tar-ball) anyway, and this can be done today (is what I did in all my projects for years).

Indeed, this is what I’d be more interested in too. Perhaps, like LLVM targets, we default to building all dialects, and have a CMake option that allows you to only build those that you need.

A second (optional) step would be to allow you to build downstream dialects and link with the upstream libraries. This would make discussions like the ones we had with TCP mostly redundant, as they can all live anywhere and it’s the responsibility of its sub-community to keep it up-to-date.

Other than “have low expectations”, no. Every few months I sometimes go and take the weed-whacker to various projects in an attempt to make them not build the world, but it always regresses to build everything. In my experience, we are not doing a good job at this… not a principle problem, just practical. It gets even more weird when considering ancillary things like the dynamic libraries, etc.

There is nothing but attention from reviewers trying to keep an increasingly complicated dependency graph from spanning things it shouldn’t. I think that there needs to be hard project boundaries that reinforce the logical boundaries.

1 Like

We do keep this building in our CI, so I can tell you that flow does work: iree/build_tools/llvm/byo_llvm.sh at main · openxla/iree · GitHub . Some customers asked for it, so we supported it. Some folks, I am told, use this to have an LLVM revision different from an MLIR revision. There are long sequences where API breaks are minimal at that intersection.

But I was never successful in getting it to build a more rational subset… so double edged sword: you get the separation/re-use but end up eating the whole thing (I have managed from time to time to tame it, but as mentioned, it always gets out of hand again and I stopped trying).

Got it, thanks. I will adjust my expectations appropriately :slight_smile: