Graduating ModelBuilder to LLVM monorepo?

We (IREE/Google) have been incubating ModelBuilder in our repo for a while (with more extensions/uses in a private repo). We’ve found it to be an effective mechanism for building/running/benchmarking samples of real-world ML model structures, and is a pretty key part of our workflow for doing correctness and optimization work on codegen “backends”. When interacting with various hardware partners and other parties, it has been useful to have a quick prototyping environment to knock together a sample and iterate on it. I think that regardless of the (eventual) frontend story, having such a tool is a good entry-point to the ecosystem for a whole range of people who might be put off by the more full-stack options available and just need to focus on backend correctness/perf.

Since it isn’t really IREE specific and it would be best if it didn’t carry the baggage of a more complicated system, we’d like to graduate it out of our repo. As part of Google, we could either do this as a standalone project (say google/mlir-modelbuilder on github) or merge it into the LLVM monorepo.

Benefits of the latter would be that others beyond us would have an easy entry-point for knocking together samples, doing testing/benchmarking, etc without needing to go through the pain of keeping up to date with yet another LLVM dependent component just to run samples/prototypes. Negatives would be all of the usual reasons to prefer an out-of-tree project vs in-tree.

Opinions? We’re happy to open-source it as a separate project, but before going through the overhead, I wanted to check with the community and see if there was interest in putting it in the LLVM monorepo. Perhaps under mlir/examples/model_builder? If it ever becomes used as more of a core workflow, we could further move it out of examples in the future, but I don’t think it would be necessary to start that way.

@nicolasvasilache @aartbik

I’d like to understand more what it is?
Looking at the header: it seems right now to be a class deriving from OpBuilder and providing a few method to create IR. It also owns an MLIRContext and a module. Is that it?

I’m not sure if there are a few things to untangle from what this class is trying to provide? It almost seems like an IRBuilder++, but the scope of it isn’t clear to me (I don’t really get why “Model” in “ModelBuilder” actually).

A more minor aspect is that it also seems quite dependent on EDSC which I think we are intending to merge into the builders and I’d rather see progress on this before adding more components that depends on EDSCs in-tree.

The ModelRunner looks like a fairly thin wrapper around the facilities of mlir::ExecutionEngine, can we improve this instead?

Similarly to @mehdi_amini, I’d like to see a brief description of what ModelBuilder does and maybe an example of its usage.

From looking at the code, it seems like an OpBuilder extension with some convenience functionality to manipulate owned modules and functions inside them, and additional calls to create larger code fragments at once instead of chaining builder.create calls. The execution/benchmark harness (ModelRunner) seems to be separable.

It also seems to be mixing Builder and EDSC interfaces. My main concern with it is the added complexity of this. I wonder if we could promote some useful components into “main” OpBuilder and keep some others opt-in.

This also seems related to the EDSC discussion I initiated. In particular, the “remove ValueHandle” and “DSL-style helper classes” suggestions.

Oh wow, I thought I was caught up on this discourse but completely missed that thread. I’ve skimmed it and will review in more detail, but am generally +1 on both the approach and how it was reasoned.

We can hold on this while the work on that plan materializes. My main reason for pushing this today was purely pragmatic: we were in a setting with a partner where the barrier to entry to work from meaningful examples and make progress on implementing specific hardware features in the codegen was the limiting factor to making progress. It is not the first time that I’ve fielded and helped people work around this criticism. Specifically cited was the lack of an accessible entry point in mlir-core. I parsed this roughly as a desire to:

  • Check out a repo
  • Run this sample which looks like what I expect for the domain (vs a lot of write-once current OpBuilder based IR stitching)
  • Make modifications to support my requirements
  • Build a simple binary that I can test/benchmark/etc

Any of us who are MLIR natives could likely squint and do this without too much trouble, but I’m glad we’re seriously considering doing the work you describe, because a lot of the people who we need to be making progress with respect to specific devices and ISAs are going to be much faster to engage if we can significantly lower the bar to entry, giving them something with a more fluent API and letting them get in and get out with their testing without learning everything about the tooling. From my vantage point, I can fairly concretely say that we are losing important contributions because of some of these issues, especially from people in the more traditional “ML kernel engineering” and hardware maker sides.

If we had the cleanups you mention, I would alter this proposal to instead contribute the actual example models and likely a small amount of glue code to make the benchmarking/testing process more fluent and capable.

1 Like

I posted two threads quasi-simultaneously, both of which quite long, no wonder you missed one.

The original EDSC, the proposed remake and ModelBuilder share the motivation of improving the IR manipulation interface and lower the entry barrier. And it looks like the approach taken in ModelBuilder aligns quite well with one of the propositions we had for evolving builders (DSL-style helpers), so it would make sense to chain one discussion to the other, but I am wary of conflating the two. Let’s try and take into account the reasons that led to ModelBuilder being created in the builder work, and see how it affects the ModelBuilder.

The example models part sounds like one of the two things:

  • templates of end-to-end flows, with simple examples that people can adapt;
  • a test suite for functionality and performance regressions.

Both are valuable additions IMO, but the proposal may be perceived differently depending on how it is positioned.

A quick note to +1 on these plans. I see very strong value in multiple fields, not only ML, for something like ModelBuilder. It has high potential to streamline testing, benchmarking, deployment independent of a given framework, bare-metal execution, and general adoption. I’m also in favor of pushing on the EDSC cleanup track, and doing this first.

One question that arose during a recent conversations with Alex was about the heavily overloaded name “Model”. Since several domains outside of ML also use Model to refer to some sort of specification or high-level program, typically with domain-specific properties, I would actually like to keep it. ModelBuilder makes sense when “modeling” embedded systems and circuits as well as ML models, for simulation or code generation purposes.

I didn’t originally name this, and I generally have the same feeling about the word “Model”. I think it unnecessarily ties concepts to overloaded domains. I generally prefer names like “computation” or “program”. In this case, “kernel builder” may also make sense. All of those are equally overloaded but I feel don’t implicitly bind to “ML”.