We want to add a tool for native cost model evaluation, under llvm/tools/llvm-cm (for “cost modeling”).
Specifically, the tool would take in object files alongside machine basic block-level profile information (as obtainable through
⚙ D143311 [MLGO] Add BB Profile Dump Pass for Regalloc Case), and produce a latency estimation for a function, and (eventually) call graphs. The tool would give us a training signal for machine learning optimization efforts, e.g., MLGO regalloc training. Initially, we want to consume ML-trained latency evaluators produced by llvm-exegesis, but the tool is not meant to be restricted to these (or to ML-based latency evaluation).
We have considered the option of just adding this within llvm-exegesis, however, with llvm-cm, we want to get a plug-in alternative that can ingest llvm-exegesis models, and possibly any other model.
We have also considered extending llvm-mca, which also uses scheduling models to get machine code performance data, however, the point of mca is to work as a performance analyzer for individual microarchitectures, providing detailed uops details, and this isn’t what we are trying to do.** **
For example, build an application with the flag found here (BasicBlockFileDump), yielding profile information. From there, running llvm-cm with the input file and profile information (e.g. a .o file and .csv file), gives us our latency estimation and performance aggregates on an individual function level. As the tool’s functionality grows, it could include different flags, (e.x a flag that outputs data tuned to function call graphs, or call stack depth analysis).
clang -cc1 … -o foo.o -mllvm -mbb-profile-dump=foo.csv llvm-cm foo.o –profile=foo.csv -evaluator=exegesis function1: Latency Estimate: 123; dynamic # of calls: 5 function2: Latency Estimate: 12; dynamic # of calls: 0
Eventually we could pass linked binaries and/or directories with pre-link objects (details TBD):
llvm-cm my/dir –profile=foo.csv -evaluator=exegesis -start_from=_Z6foobarii Latency Estimate: 123; dynamic # of calls: 5