I agree, we don’t want MLIR to become associated with being a complete compiler for a particular frontend. We want to be infra.
My main concern with not having something like a “numpy” dialect or some sort of relatively tightly coupled frontend is that we don’t have any serious correctness testing happening upstream at the moment. It’s like we’re developing LLVM but don’t have an equivalent of test-suite
or clang
that we can use to find and investigate correctness issues. Most of the features in LLVM development (exceptions being things like GC statepoints) are in some way testable by running test-suite or crafting an input to clang.
Of course, MLIR by its nature serves a much more diverse set of compilation workflows than LLVM, so we shouldn’t expect to be able to recreate LLVM’s exact situation. However, I believe it still needs some thought, especially as things like TCP come into the picture.
Something like a numpy frontend could stimulate a whole lot of our infra on substantial workloads against a known-good reference. In that sense, its maintenance burden can be counteracted by catching bugs and improving development velocity.
That doesn’t necessarily help the refactoring burden, which AFAICT is mostly a function of the number of lines of code in the repo
I see two ways to conceptualize this problem:
-
Leaning on dialect contributors to do refactorings. That’s mostly a community culture problem. We want to encourage a community that feel empowered to make changes to core infra and takes that on when they see something that could be improved, even if it turns out to be a large refactoring.
-
At some level, the time spent for a core dev (e.g. River) to update some part of the codebase should be balanced by the continuing value that that piece of the codebase contributes to the ecosystem.
Thus, as the investment in MLIR grows, the number of dialects increase, etc. the core evolution cost increases as well, but as long as the total value of MLIR increases at the same or faster rate, then core evolution still is a useful task (that is, it is a good use of engineering resources). We could think about this as “5% of the engineering effort devoted to MLIR is devoted to core evolution”. I think it’s unrealistic to expect that the core evolution costs should remain constant for eternity or even decrease. So from this point of view there are two parts of this:
a. keeping the engineering investment in core evolution at a steady 5% (or whatever) of overall MLIR investment. (that is, the rate of investement, perhaps measured in something like “number of software engineers”). The extent to which we are successful at 1. above can reduce this number.
b. keeping the engineering investment (e.g. number of active contributors) in MLIR proportional to the number of lines of code in the repo
By combining a. and b., we arrive at a situation where the engineering bandwidth we have available for core evolution remains proportional to the number of lines of code in the repo, thus keeping needed refactorings / core evolutions manageable
Of course, all this ignores out-of-tree code…