[RFC][premerge] Moving premerge CI scripts to https://github.com/llvm

Currently, there are multiple components located in https://github.com/google/llvm-premerge-checks, including:

a) Source of the proxy server that connects Phabricator to Buildkite

b) Docker images of Linux/Windows Buildkite bots

c) Cluster configuration: Kubernetes and Terraform files

d) Scripts for Buildkite builds

Additionally, scripts for libcxx builds located in https://github.com/llvm/llvm-project/tree/main/libcxx/utils/ci.

The most interesting part is the scripts (d) so we definitely want to move them. It would also be beneficial to have the libcxx and “generic” parts in close proximity.

There are two alternatives I see:

  1. Move the scripts under llvm-project to a directory like “utils/ci.”

    Pros:

    • Code and CI configuration can be updated together (e.g., cmake flags).
    • Testing CI script changes can be done similarly to normal changes, especially for trivial ones

    Cons:

    • More “non-functional” commits to the repo, although a constant stream is not expected.
    • Some “trivial” build steps that currently only checkout scripts would need to get the whole repo. This is not a significant concern.
    • Testing CI for non-premerge pipelines might become challenging. For example, to test a change in the logic of periodic mainline builds, one would have to direct the build to a branch or a different repo. However, as we will be running in a fork, this might not be a major issue.
  2. Move the scripts to the “llvm-zorg” repo (or a completely new repo).

    Pros:

    • Less clutter in the main repo.
    • Easier enabling of PRs to test CI changes before submitting them.

    Cons:

    • Building logic becomes more complex, as we would still need to work with two repos simultaneously.
    • Uncertainty regarding commit policies to zorg. Not everyone may have commit access, adding complexity for developers to contribute.
    • Additional tooling required to test changes to CI.

Overall, option (1) seems feasible.

libcxx, are you comfortable with moving to a directory like “utils/ci/libcxx”?

Regarding the actual move, I currently don’t have the time to completely rewrite and refine everything according to high standards. Therefore, my plan is to initially move things as they are and then refactor to the new setup. If anyone is willing to help and learn, please let me know.

Regarding (a), (b), and (c) above, they will likely remain in the Google-owned repo. (a) doesn’t require any changes, and (c) involves controlling computing resources and I don’t feel comfortable opening it wide. It’s preferable for the docker images to stay in the same repo for automation purposes. If a developer needs to reproduce something in the exact same environment, it should be fine to build image from anywhere.

P.S. I’ve heard that contributing to https://github.com/google/llvm-premerge-checks has been challenging for some folks due to license or other issues. I would appreciate it if you could let me know the specifics. Feel free to reach out to me personally on Discord.

Ideally, periodic mainline builds would run something very similar to other pre-merge checks so this would not be a huge issue.

IMO, this is 100% the way to go. The benefit of being able to modify the CI scripts easily in the monorepo and test them in a PR is a huge one. For example, if you’re adding some new significant configuration option in the code, you can test it immediately by just adding that new CI configuration in the same PR where you introduce the change, and everything works. That’s a really huge benefit and we wouldn’t want to go back to something else in libc++ now that we do things this way.

Yes, not a problem. We’ll still keep our own test scripts (at least for now), but I am super happy to move them to a location that makes sense in the monorepo.

I assume you are talking about (d) here. I think it should be fine to move things as-is and then I’m happy to take a look and give a hand to try to simplify things.

I don’t quite get what is different between zorg and these scripts fundamentally? If nothing there and without good reason for diverging: I would push for unifying the approaches (one direction or another: having “build-and-test” wrapper scripts in the repo seems OK, even though you are just moving the point of coupling from cmake flags to other bespoke invocation: when it changes it breaks the contract the same way).