[GSoC 2024] Better performance models for MLGO training

Description:
Reinforcement learning trained machine learning guided optimization (MLGO) models for register allocation-eviction and inlining for size are being used in real-life deployments. However, the reward signals produced by existing cost models with their insufficient ability to account for many of the dynamic effects associated with runtime as a result of assumptions made about the execution environment and runtime behavior of code, leave much to be desired. Unaccounted for cache misses and branch mispredictions can set throughput estimates off by orders of magnitude making modeling them a high priority.

Expanding to more performance areas is impeded by the reduced prediction quality of our performance estimation models in these scenarios. Improving those is critical to enabling the application of MLGO systematically to more optimizations.

Expected outcomes:
Improved modeling of the execution environment by including additional runtime/profiling information, such as additional PMU data, LLC miss probabilities or branch mispredictions. This involves (1) building a data collection pipeline that covers additional runtime information, (2) modifying the ML models to allow processing this data, and (3) modifying the training and inference process for the models to make use this data; possibly building upon the work of the GSoC 2023 project of the same name.

Skills:
C++ and Python, along with some ML experience and some compiler experience.

Contacts:
@ondrasej @mtrofin @boomanaiden154-1 @vshah

@vshah Is it possible that PGO training can cause overfitting?

What exactly are you referring too? Using PGO data to help train cost models? I don’t believe any work has been done in that area so far. Most work has focused on using PGO data to aggregate data from cost models than work at a finer granularity like the basic block level. Training MLGO models using PGO data? They’re generally pretty small relative to the amount of data that they’re trained on, so they don’t seem to overfit, but specifics probably matter a lot. Generally, the selected model/model architecture/amount of training data/application of techniques like dropout and regularization will probably matter a lot more for whether or not something will overfit than whether or not PGO data is used.

Can you share documentation sort of, for the work done previously I’ll go through it and am interested in contributing to this project.

The results of Viraj’s work from last summer’s GSoC work are available here. Viraj is still working on this main PR here, but there hasn’t been anything written up on that yet and it’s still in the early phases before we can get good results.

I also have made some progress on better basic-block level cost models, but haven’t made enough progress to report there.

We’re both giving talks on this topic at EuroLLVM, which would probably be the soonest that things will get published/documented in some format.

Yeah just caught some PRs here Pull requests · google/gematria · GitHub going through them I’m able to get a sense of what’s going on here’s what I understand using PMU counters the objective is to drive ML/RL models to help reduce longest chache miss latency, so far. If you don’t mind may I pitch in and help in this project? I’ll go through some previous work and get up to speed on most things by myself. On the odd occasion, I might consult you. what do you think?

The models in google/gematria are just cost models. They don’t actually change any compiler decisions to generate better code, instead just telling you what code is fast/which code is slow. Eventually the idea is that these models can be used to train ML models to make better decisions and also enable better code static analysis (as a side-goal). Viraj is working on making these models take LLC-miss information into account (and setting up a framework so that the models can take into account arbitrary PMU data and beyond). I’m working on massive datasets to train BB-level cost models that conform to the classical assumptions (no cache misses, optimal frontend conditions, etc.) that are hopefully hyper-accurate due to an increase in scale of data.

Having more people to work on the project would be great. If you’re planning on doing work though, please reach out somehow (Discourse/here works, Github issues work too) so that we’re not duplicating effort and what not.

I recall our conversation here How to compare IR/object files statically for predicting better performance for machine learning based cost models - IR & Optimizations - LLVM Discussion Forums this is exactly what I wanted at the time, and even now, Okay so this project will enable us to train on larger size datasets, That will open up a lot of opportunities of the top of my head the easiest one is BranchProbablityInfo, Yeah I’d like to contribute for sure, there must be a google group I’m assuming, if there is please could you share the link.

Mostly just the llvm-ml slack: Slack. There’s a cost modeling channel that would be good for discussion (although it’s been inactive for a while).

1 Like

great, I’ll catch up there. Thanks!