We are using machine-guided compiler optimizations (“MLGO”) for register allocation eviction and inlining for size, in real-life deployments. The ML models have been trained with reinforcement learning algorithms. Expanding to more performance areas is currently impeded by the poor prediction quality of our performance estimation models. Improving those is critical to the effectiveness of reinforcement learning training algorithms, and therefore to enabling applying MLGO systematically to more optimizations.
Expected outcomes: Better modeling of the execution environment by including additional runtime/profiling information, such as additional PMU data, LLC miss probabilities or branch mispredictions. This involves (1) building a data collection pipeline that covers additional runtime information, (2) modifying the ML models to allow processing this data, and (3) modifying the training and inference process for the models to make use this data.
Today, the models are almost pure static analysis; they see the instructions, but they make one-size-fits-all assumptions about the execution environment and the runtime behavior of the code. The goal of this project is to move from static analysis towards more dynamic models that better represent code the way it actually executes.
Contacts: @ondrasej @mtrofin @boomanaiden154-1
my name is Andreas Botzner and I am currently an undergraduate at the University of Technology Graz Austria and I’m interested in working on this as part of Google Summer of Code.
I have experience in C and C++, Python from both work and open source contributions (Ansible Community Collection) as well as experience in ML from multiple university courses.
I do have some experience on compilers and I am using the current semester break to get up to speed with them as well as the LLVM project itself and it’s various different components.
If you have any suggestions on what topic and or component I should read into as preparation for possibly working on this project please let me know.
Also I’ve read that the preferred medium of contact is Slack. Here’s my email email@example.com
sent you an invite to the slack channel. @ondrasej can detail more, probably a good start is looking at tools like llvm-mca, the ithemal paper and the granite paper. Also @boomanaiden154-1 recently came across this.
I came across this project and instantly got attracted to it. I would like to participate. Can you please add me to the slack channel?
I am working in an AV startup as research engineer which has provided me with extensive ML experience especially with model optimization, deployment and benchmarking in C++. I have worked a lot on Nvidia Jetson SOCs and am aiming to go deeper in the field of optimization by learning about compilers.
I am sharing my email - firstname.lastname@example.org
I’m an undergraduate student with a strong foundational understanding of Deep RL, and have also spent the past few weeks familiarizing myself with compilers, LLVM and the MLGO project.
Some resources I have gone through are the MLGO paper, this Google blog post, previous years’ work, and the documentation on the git repository. I am currently working through the training phase of the inlining demo, and would like to know how I could get closer to contributing.
Could I be added to the Slack channel? My email address is email@example.com
Thanks and regards.
Not a lot more to add. The three papers that @mtrofin shared + looking and and experimenting with llvm-mca are very good start to the topic.
Hi @mtrofin and all,
I’m a 3rd year undergrad in computer science and math (mostly interested in computational math and probability, not that this is especially relevant). Looking at GSoC for this summer and have been hoping to learn more about compilers – having messed with LLVM for curiosity/ some personal projects after my compilers course, previously done work with ML in applications elsewhere, and being a fan of several projects built on LLVM, I felt this would be a great outlet + learn more about optimizations. Noticed you’ve linked some readings; will check those out of the next few days as I have time.
Should probably note programming experience, etc, following the others – top three are, maybe predictably, Python, C++, and Julia in that order, though I’ve spent time messing with several others over the years (e.g. at one point had a decent project with the now seemingly abandoned Haskell bindings
llvm-haskell, but haven’t been able to get
stack to build it again without some unportable black magic since I stopped working on it at the end of last summer Have since got it halfway working with the official C++ API). Decently familiar with PyTorch, TF, and XGBoost also in descending order.
Definitely best to reach me by my school email, which I also already have a Slack with (the primary on my github is an old personal); would love to learn more about the project regardless of outcomes – firstname.lastname@example.org
Done. Best thing is to come up with a specific proposal - which is necessary anyway as part of the contributor submission process for GSoC. We can help on Slack with fleshing it out before you submit it.
Prospective GSoC 2023 contributors: if you have proposals related to this topic or MLGO in general, and want to discuss/get feedback on the proposal before submitting it, the monthly MLGO meeting tomorrow will be dedicated to that (“open hours” kind of a thing). Details here.
I’m a new employee at ByteDance for their LLVM compilers. I have done some LLVM ports in the past but not for a while.
They are very interested in MLGO and I would like to demo something to them.
Is this something I could do on a mac M1 for example or using Google COlabs?
I am at a senior level and could form a whole team to explore this and contribute to the open source effort. At this point I’m exploring the possibilities.
I have some experience with deep reinforcement learning.
There’s an end-to-end demo that uses Fuchsia’s codebase for training. It should be reasonably up to date, and if you find discrepancies going through it, happy to accept patches.
Note that pretty much everything assumes Linux (Debian).
In LLVM, a good starting point would be
lib/CodeGen/MLRegalloc*. On the latter, the “priority” one is WIP.