[RFC] MLGO: Open evaluation and new onboarding doc on LLVM

mtrofin · December 18, 2025, 3:22pm

Copying here the summary of the discussion we had at the November MLGO LLVM meeting, so we don’t have to reference 2 sources of info:

we shall establish an easy-to-access benchmark for any new developments on MLGO to evaluate models more fairly and openly. @DataCorrupted
- LLVM based benchmarks.
  - how to set up training
    - ideally not just TF, and we’d like a pytorch alternative, but let’s stick with what we have right now
    - first preference: something that works (i.e. the benchmark). Helps others pitch
  - how long (steps)
  - both perf and size
  - idea: start with size, and have the google/ml-compiler-opt releases have a Docker image, composable over whatever Docker image we use for the llvm side bot, with everything fixed (gin files, command line, nr iterations, etc). Still using LLVM as corpus.
    - kind of orthogonal, the evaluation: so if this model reduces clang by “x”, clang build how? (i.e. an exact cmake configuration)
    - we have the chrome stuff, but it’s (a) unfriendly to LLVM devs (another build system… not the most trivial either), and (b) the lack of performance regressions when doing size are un-verifiable (easily) by non-googlers
    - phosek: make sure we fix in the docker container all deps (python, cmake… host toolchain…)
      - also not the Docker Fuchsia has in the google/ml-compiler-opt (because this one is Fuchsia’s training)
      - can we make sure the Docker is evolvable (e..g one can switch the LLVM hash easily - understanding there may be risks). Basically we should make it easy to switch the LLVM stack in the future (without trying to impose back-compat requirements).
an easy to set up LLVM-based training configuration, with training details also precisely spelled out (like which algo, how many steps, hyperparams etc). (maybe Docker?)
- Github build bot → be able to proactively maintain the demo
- some addressed above
- concern: do new LLVM releases break google/ml-compiler-opt (e.g. a periodic CI on that repo that pulls latest LLVM and makes sure a few training steps can happen without surprise. Train for 5 steps, then deploy the model both as TFLite and as AOT)

Topic		Replies	Views
Better performance models for MLGO training GSoC mlgo , gsoc2023	15	1888	January 4, 2024
RFC: a practical mechanism for applying Machine Learning for optimization policies in LLVM LLVM Dev List Archives	14	614	April 12, 2020
[LLVM-DEV'25] LLVM :hearts: ML Workshop Community mlgo	11	1268	November 4, 2025
[LLVM-DEV'24] LLVM :hearts: ML Workshop Community mlgo	3	717	October 26, 2024
Questions about MLGO IR & Optimizations mlgo	2	270	August 2, 2024

[RFC] MLGO: Open evaluation and new onboarding doc on LLVM

Related topics