[RFC] llvm-cm: Cost Model Evaluation for Object Files/Machine Code

We want to add a tool for native cost model evaluation, under llvm/tools/llvm-cm (for “cost modeling”).

Specifically, the tool would take in object files alongside machine basic block-level profile information (as obtainable through ⚙ D143311 [MLGO] Add BB Profile Dump Pass for Regalloc Case), and produce a latency estimation for a function, and (eventually) call graphs. The tool would give us a training signal for machine learning optimization efforts, e.g., MLGO regalloc training. Initially, we want to consume ML-trained latency evaluators produced by llvm-exegesis, but the tool is not meant to be restricted to these (or to ML-based latency evaluation).

We have considered the option of just adding this within llvm-exegesis, however, with llvm-cm, we want to get a plug-in alternative that can ingest llvm-exegesis models, and possibly any other model.

We have also considered extending llvm-mca, which also uses scheduling models to get machine code performance data, however, the point of mca is to work as a performance analyzer for individual microarchitectures, providing detailed uops details, and this isn’t what we are trying to do.** **

For example, build an application with the flag found here (BasicBlockFileDump), yielding profile information. From there, running llvm-cm with the input file and profile information (e.g. a .o file and .csv file), gives us our latency estimation and performance aggregates on an individual function level. As the tool’s functionality grows, it could include different flags, (e.x a flag that outputs data tuned to function call graphs, or call stack depth analysis).

clang -cc1 … -o foo.o -mllvm -mbb-profile-dump=foo.csv

llvm-cm foo.o –profile=foo.csv -evaluator=exegesis

Latency Estimate: 123; dynamic # of calls: 5
Latency Estimate: 12; dynamic # of calls: 0

Eventually we could pass linked binaries and/or directories with pre-link objects (details TBD):

llvm-cm my/dir –profile=foo.csv -evaluator=exegesis -start_from=_Z6foobarii

Latency Estimate: 123; dynamic # of calls: 5

Awesome! It’s very cool to see work on (mostly) static cost modelling. I think this would make a nice addition to the LLVM suite of tooling and nicely complement llvm-mca’s detailed microarchitectural modelling with tooling applicable for larger sections of code.

A couple points of clarification:

  • When you mention consuming ML based latency predictors, I assume you’re referring to GRANITE in Gematria rather than llvm-exegesis? llvm-exegesis is just a low-level benchmarking utility. There’s some work being done on llvm-exegesis to enable its use for dataset creation to train learned cost models, but it isn’t the cost model in itself.
  • Do you have a concrete plan yet for what latency evaluator(s) you’re planning on implementing first? As far as I’m aware, there are currently license issues with something derived from uiCA and there is currently no GRANITE model available that can be open sourced (although a fully open one should hopefully be trained soon) in addition to a lack of infrastructure for exporting models for consumption in LLVM. Since most of the cost model in llvm-mca is in /llvm/lib/MCA, maybe it’s worth integrating that as a first strategy as it might be easier? But getting the tooling setup to integrate GRANITE models would also be good to have and something that would probably eventually be done anyways.

Also pinging @RKSimon and @adibiagio as they work on llvm-mca/X86 cost modelling and might have some comments/useful suggestions.

For clarification:

It would be more accurate to say that we plan to consume GRANITE models, alongside other models in the future. As for which latency evaluator we are planning to implement first, we plan on getting the framework for ingesting GRANITE models first, instead of integrating the cost model of llvm-mca.

Thank you for posting this RFC. (Right now our community has an issue in that it’s sometimes unclear whether a RFC has been accepted. Sorry about that!)

I noticed that @JestrTulip has posted an initial patch (⚙ D153376 Introducing llvm-cm: A Cost Model Tool), and currently only @jhenderson have actively made comments, with some comments from me.
I believe there is a significant disparity between what the RFC claims and the current state of the implementation.

When I examine how we have previously accepted new llvm/tools/ tools, I observe a pattern where the initial integration provides quite complete features, at least fulfilling the RFC documents, and the code structure is relatively stable. For example,

These projects were likely developed locally as branches, and then cleaned up before requesting the community’s review.
This is the normal approach adding a new tool. It has many benefits beside helping address pushback from individuals who would claim, “this is not an experimental playground”…
In addition, I personally enjoy a lot when a tool has clean early history.

I wonder if llvm-cm should take a similar approach by continuing to be developed before seeking integration into the monorepo llvm-project.
If you need a place to track the progress, to possibly allow external contributors, either a company repository or a repository within llvm/ (consider contacting the moderators of this forum) should work.

When the tool becomes ready, split into several reviewable patches, clean the history during the process, add Co-authored-by: tags whenever appropriate, post them for review (reviews.llvm.
org or our new review system).

If you want to have another opinion about LLVM coding style during the development process, feel free to loop in me:)