Inliner cost model

Are there any plans/directions for the inliner cost model? Looking at the code it seems to just inline all feasible calls.

I have a program where the current inliner cost model results in significant code size bloat (about 8x). (IR before inlining + call count stats). Happy to help implement when this starts to block me. Also wondering if other folks in the community are running into inliner cost model issues.

cc @River707

River and I had a nice chat offline. Just a quick braindump of some stuff that came up:

  • cost model for inlining, especially in high level dialects is really hard. It seems to depend transitively on a potentially vast number of later lowering decisions.

  • code-size based heuristics (like LLVM):

    • Consider: high level ops or similar library calls in high level (pytorch/tf/numpy) frameworks such as “add two tensors, which might have any shape and dtype” – which data types? what shapes to support? codegen size might vary wildly based on that, but calling out to a library might be tiny either way!
    • Consider: linalg-on-tensors ops. Vastly different code size depending on fusion. E.g. if an “add” is fused, then it is just one “add” machine instruction, but if it isn’t, then it is at least a loop around an “add”, but depending on the runtime and device, it might expand into queuing a GPU grid dispatch, etc. which is way more.
  • performance-based heuristics are totally dependent on the backend/runtime, and require similar sorts of thinking as code-size

    • would inlining enable fusion?
    • would it enable intraprocedural shape inference to do a better job?
    • does the function have context-dependent shapes and is beneficial to specialize?
  • inliner can do some amount of abstract interpretation of the callee to find unreachable blocks and adjust cost accordingly (e.g. prove we are only in the “fast” push_back case). LLVM does this, but it is unclear how to extend it to the MLIR level of generality.

  • At some level we’ll need an interface where we can plug in machine learning to learn good cost models. Maybe that’s the way out.

  • Some trivial “always beneficial” inlining is possible to do, like identity functions and private functions with only one callsite.

I think that this has got to be an opinterface sort of thing. You might want to look at llvm::TargetTransformInfo::getInstructionCost as one example existing implementation of this idea.

I’d specifically recommend splitting the “measuring an operation” in code size or performance or whatever, from the policy of “should I inline a function containing a bunch of operations of some size”

From my perspective, and I think Sean touched on this, there can’t feasibly be one MLIR Inliner cost model. I just don’t see how we can define a reasonable model that crosses all possible domains. What I had in mind is that we provide an abstract CostModel class that clients can override to determine when to inline or not. In some abstractions, this can be based on code size using some dialect or target interface. In others, it can be based on some analysis that is performed by the frontend. In other cases it can be an ML model that does some seemingly random computation to compute a value score for the function. The point of all of this being, I agree with you and I don’t think any kind of “cost” notion is going to scale to all MLIR users. We may have several “default” cost models that users can pick from that use some combination of interfaces (and what not), but I don’t think the inliner should embed any of this into its core logic.

– River

Yup, that seems right to me. I don’t think it is precisely an OpInterface though, because in general it is not a property of the Op, but a property of the “target”. (i.e. it is TTI::getInstructionCost not FSub::getInstructionCost). We don’t really have a “target” abstraction (which for MLIR needs to cover so many layers potentially). So we’re probably looking more like a “plugin” or InlinerCostModelBase to inherit rather than OpInterface, which a user will configure based on their understanding of their pipeline setup.

Also, it probably needs to reason at a granularity larger than a single instruction, to enable parameterization sufficient for ML model heuristics to do a great job.

Sure, but that isn’t why LLVM is putting this in its target libraries. This is done because llvm intrinsics are the “logical extension point” for targets, and the mid-level optimizer wants to be able to do queries without enumerating all the targets.

That is the MLIR related problem I was referring to. The “ops can lower to different ways on different targets” problem is going to be intractable in general I think, particularly for something like a tensor op. An accurate cost model is going to be nuts.

-Chris

So, what is the correct thing to do to avoid unconditional inlining?

We, the sparse compiler team, are also facing the same issue as the sparse compiler deliberately generates a few complex function calls for operations like sparse tensor insert/sort.

If there is no such a way, can we introduce a filter callback to allow users to filter out call sites that should not be inlined at least? (similar to L102@MLIR: include/mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h Source File)

Could I do something like the following:

addCostModel<MyDialectCostModel>();

or

addTarget<MyDialectTarget>();

when I am initialising my dialect?

I am also interested in investigating the possibility of adding an inlining cost model in our compiler. Would it make sense to parameterize the existing MLIR inliner pass with an instance of an InlinerCostModelBase that would provide hooks answering an inlining profitability question for a certain call site (probably, given a wider context of the CallGraph as well)? That interface can then be used around shouldInline on top or instead some of the checks that are currently there.

2 Likes

I’d love to see this. I’ve needed it a few times but never enough to do anything about it.

1 Like

Thanks for expressing the interest! Let me review the current state of the things in MLIR inliner pass and come up with some proposal around next week.

Thanks. I’ve often wished that instead of an op/identity local decision on whether to inline, we had an easy mechanism to construct a pass with a custom cost model (where the cost model could assign an “infinite” cost that disables inlining). I think it is mainly a job of prying some of the infra apart and making it a bit more pluggable.

1 Like

This has always been the plan actually: the « inliner library » exposes the interface for legality.
The inliner implements the logic to drive all this, but we punted on exposing the hooks for the profitability on the inliner until the concrete customers impact materialize (user driven development as often) so that requirements are better defined and someone volunteer to design it :wink:

The default « inline » pass should probably be renamed a « always-inline » (no cost model) and then folks can just wrap the inliner in their own pass injecting a cost model (we could also build such thing upstream for a generic enough cost model of course, and we’ll need at least a test pass)

1 Like

Based on the recent refactoring of the MLIR inliner pass, I suggest adding a profitability callback to the Inliner implementation in [RFC][mlir] Add profitability callback to the Inliner. by vzakhari · Pull Request #84258 · llvm/llvm-project · GitHub

Please review and comment.

Thank you for the patch - very appreciated. I’ll have a look in a bit, but it also looks like Mehdi had already taken a couple of passes.