[RFC] AI-assisted Bazel Fixer Bot

tl;dr: Proposal to automatically create AI-generated PRs to fix broken bazel builds. It doesn’t impact anyone who doesn’t care about bazelbot.

Background

One of the build systems LLVM supports building with is Bazel. Contributors to LLVM are not required to keep Bazel builds in working state as it is part of Peripheral-tier that cater to few specific subcommunities, including us in Google where our consumption of LLVM downstream relies on keeping upstream Bazel builds working. Because of this, we manually fix Bazel builds multiple times a day.

Proposal

We would like to deploy an experimental AI-assisted bot that automatically creates verified (by running local bazel builds) pull requests to fix Bazel builds, which are then manually reviewed and merged by Bazel maintainers.

Initial Design & Deployment

Bot will be a simple python script that listens to breakages in our upstream Bazel buildkite pipeline. The Bot monitors for new bazel breakages (passed to failed build state transition). The bot identifies failures and first attempts a resolution by running dwyu command from bant.build which fixes missing bazel deps for us. If that is unsuccessful, it consults an AI agent for code changes required to resolve the Bazel error, using the SHA of the breaking commit as a reference. Suggestions are verified using a local build which if fails triggers the AI agent again in a loop until local build verification succeeds or we hit the maximum threshold. When the local build verification succeeds, changes are pushed to a branch in llvm/llvm-project under this format: users/google/bazel-fix- followed by creating a pull request from this branch. Bot gives up after repeated failed verifications. Code changes by dwyu and the AI agent will exclusively be to Bazel files in utils/bazel.

The bot will be running in Google’s cloud infrastructure together with other machines that are currently supporting our upstream bazel builds enabling Bazel remote cache sharing. The AI agent will be using Gemini APIs to make calls to LLMs.

Testing

Running the bot on all Bazel breakages from Nov 1, 2025 to Dec 15, 2025 led it to generate fixes for ~85% of the total bazel breakages. Here are the sample fixes by the AI bot for these breakages. While more than 80% were completely identical to fix proposed manually, some of them included harmless unnecessary additions. Refining the prompt over time should facilitate both quantitative improvements in reliability and qualitative enhancements in the overall quality of fixes. Moreover, all these fixes are purely AI-generated and didn’t run the dwyu tool which should reduce unnecessary additions even further in the final deployed version.

Out of total breakages analyzed above, around ~40% of them were simple dependency fixes that could be handled by dwyu while the rest of them needed AI assistance.

Alternatives considered

A non-AI powered Bazel bot that can handle all Bazel breakages could be another solution. This would be similar to llvmgnsyncbot that syncs GN build files from CMake and pushes any changes to the main LLVM branch. Replicating such a solution to sync Bazel BUILD files from CMake is not possible due to Bazel’s strict deps, and layering checks. There isn’t enough information in just CMake files to fix Bazel builds, requiring us to look at other files for a proper solution to handle all corner cases. It quickly becomes complicated and hard to maintain. Turning up an AI agent on the other hand has shown potential to handle a variety of corner cases for us and has proven sufficiently accurate.

Asks from the LLVM community

We require LLVM org admins on GitHub to install a Google-LLVM GitHub app, which allows us to do these kinds of integrations with enhanced security and higher API rate limits while not being tied to any specific user account. This is GitHub’s recommended way for long-lived integrations. This will also allow us to create PRs with the bot’s identity. The app will request the following two repository permissions on llvm/llvm-project. They are broader permissions than they need to be because GitHub doesn’t allow granularity finer than this:

  1. Contents (read/write): Required to create branches. Allows creating and writing to any branch but the bot would only be creating branches under users/google/

  2. Pull request (read/write): Required to create pull requests. Allows creating any pull request but we would only be sending pull requests to fix Bazel breakages.

There’s precedence with granting such permissions to bots like llvmgnsyncbot (which operates as a regular GitHub user with commit access to push such changes). The part of the Bazel bot described in this RFC that’s responsible for interacting with GitHub is not AI-powered. So there’s no risk that the AI bot will hallucinate and do random branch writes or pull request creation. It will only create branches under users/google/ and create pull requests changing only Bazel files. It is also more secure as GitHub apps require using short-lived tokens for requests which expire within hours vs personal access tokens which have longer expiration and can get misused if leaked.

Because we manually have to fix every Bazel breakage upstream today, the frequency of new automated pull requests will not be more than what it is today. Pull requests will continue to get marked with a bazel tag just as it’s done today to allow filtering these out. Since we require these fixes urgently to keep our internal processes working, the review load of all such pull requests should mostly be handled by Google.

Creating user branches in the LLVM repository and creating PRs from there makes it possible to manage everything with GitHub app authentication. Maintaining a separate fork for such branches, however, requires setting up a different bot account. To keep things simple, we would like to use users/google/* branches for this integration that will be pruned by the bot after the pull request is merged.

Let me know if you have any feedback or objections.

:white_check_mark: This RFC was accepted in this message.

3 Likes

As I understand it, this RFC proposes to replace “manual fix, manual submission” workflow with “automated fix, automated submission” one. It would be nice if “Alternatives considered” section discussed “automated fix, manual submission” workflow, which sounds like it would alleviate most of the pain while adhering to the last iteration of AI policy draft (“human in the loop”, referring to PR authors).


I hope this RFC doesn’t derail AI policy discussions too much, because we really need to make progress on that.

3 Likes

[RFC] LLVM AI tool policy: human in the loop for reference.

In the specific case of what you are proposing, I think asking the community to review AI generated PRs would put an undue burden on the community - at least until such time that you gain a confident estimate for the rate of hallucinations / non mergeable PRs.

It’s also unclear what would be the process to do an actual review. Who owns that code?

Additionally, why can’t these change be made under a fork and then submitted through the regular PRs process?

I personally think we should not have allowed the proliferation of /users/ branches to begin with, and letting a bot create an unbounded amount of branches upstream seems like it would worsen the user experience for humans.

3 Likes

As @cor3ntin pointed out, the onus is on people who push text prediction based PRs to verify that the PR is actually correct rather than nonsense, and take responsibility for it. I don’t think any automated system approaches that barrier.

I would also be concerned about these text predictors introducing incompatibly licensed code into the repository - it’s well documented that these systems are trained on opensource licenses and reproduce or launder that code to create code. I think any remotely automated system would need to ensure that the cortex used for training only consists of compatibly licensed code, and can reference the authors of that code when regurgitating it.

While the AI policy referenced only refers to the author’s responsibility to verify the code itself, I really feel that it is also the responsibility of the author to confirm that the license of that code is compatible. For instance a model that is trained on GPL licensed code is at risk of introducing automatically transformed versions of that code - again, there are documented cases where this is what is clearly what is happening.

Automating the introduction of such code seems extremely problematic.

1 Like

That is the plan. The bot would create PRs that would then have to be reviewed and merged by a human.

I wouldn’t really say the broader community are going to be the ones reviewing the PRs. While there are some community members that work on the bazel build, the vast majority of these sorts of commits come from within Google, and from our organization within Google. We’re also involved in the vast majority of code review for this and see this as a net-process imporvement.

It’s possible, but it requires some extra work when this already does not tangibly impact user branch proliferation. Given this is an automated system, it’s easy to ensure that closed/merged PRs have their branches deleted, and that is the plan. User branch proliferation from what I’ve seen is due to either old PRs which isn’t a problem here, or bad tooling (spr in particular doesn’t handle deleting base branches well for PRs not directly against main in some cases), which also isn’t a problem here.

I’m also in the process of getting a Github Action cron job setup to automatically delete old user branches that aren’t attached to any PRs to help alleviate this problem.

I’m not familiar with the specific details of the LLM used for this, but in general I would not be concerned about licensing. Bazel changes are generally trivial (adding/removing a couple dependencies in the right places) and thus unlikely to consist of large chunks of code in the first place. On top of that, most LLMs these days will have some sort of search against their training corpora after inference to ensure they are not regurgitating training data. I am not a lawyer so this doesn’t necessarily mean anything, but so far non-regurgiatory output from LLMs has been ruled uncopyrightable in US courts so far to my knowledge, so probably would not pose an issue here.

I’m not sure you understood my point, because “bot would create PRs” is not “manual submission” I’m talking about.

Tooling to make it easier to create PRs from command line that was asked for in the mandatory PRs RFC should help here, too.

Those checks are “are we literally replicating existing code”, not “are we merely transforming existing code” - e.g would they detect simply renaming the variables? trivial reorders? etc - As far as I can make out most of these protections are “make this laundering non-obvious”, not “don’t reuse other people’s code”. Even if we take the view that existing law means this laundering is not an IP violation, I feel it is morally wrong to produce code that is simply a machine generated obfuscation of existing code. If someone does want to use these glorified text predictors to generate code, the obvious requirement should be that they are trained solely on code for which the model creator owns the IP. e.g at minimum any training should track the authors and licenses of code, more honestly the entirety of the training should be on a corpus of code for which the model creator is the author/owner.

Basically: this proposal is to automate the inclusion of code that is produced by performing what is frequently a glorified renaming and reordering of existing code.

I took automatic submission in your previous PR to mean automatic submission of PRs into main. I don’t see how not automatically creating a PR actually fixes anything though. There is a human in the loop either way. The bot also already validates using conventional tools (manually invoking bazel) that the fix makes the build pass. Having an extra step in there of someone needing to copy and paste the diff into a PR (or having some form of fancier downstream review mechanism before creating a PR) to me seems like it just adds extra overhead without meaningfully improving the amount of review that happens or the number of people getting notified.

It will not help here. The hard part is getting a Github app/bot account setup with a forked repo and all the permissions associated with it. Writing the code to actually create PRs is relatively simple.

Could you use GitHub app authentication to push to a llvm-project fork in the google GitHub organization? That would make the proposal have pretty much no impact on the upstream repository.

2 Likes

I agree that we should not just do what might be legally acceptable and should try to be good citizens of the open source community. I don’t think the changes produced by the agent in this proposal are of the form that you’re thinking though. They’re almost always trivial dependency issues with files that are specific to LLVM. I guess adding something like a new cc_library to the bazel build can be viewed as just taking existing code and transforming it to fit LLVM, but it’s boilerplate that exists in every project. For more substantive changes, I think these questions absolutely need to be considered. For this though, I’m not convinced.

I think this is also an unfair characterization. Looking at the last ten commits to utils/bazel for examples of commits that this bot would be producing, we see the following:

  1. [libc][math] Refactor expm1f implementation to header-only in src/__s… · llvm/llvm-project@60e7c47 · GitHub - refactors some code into a header and moves some targets around in the build. Trivial/boilerplate
  2. Fix Xtensa Bazel build (#173073) · llvm/llvm-project@1a87e39 · GitHub - Adds a new target to the bazel build in the same style as existing targets with machinery specific to LLVM.
  3. [bazel] Port 6c51c17eecd8a19813d28b293590fc7197137594 (#173082) · llvm/llvm-project@3f3a57c · GitHub - Adds header files to a library’s sources. Trivial/boilerplate.
  4. https://github.com/llvm/llvm-project/commit/e88f3d8d8022b265670565d72e32b8680093b5e2 - Ports the addition of a new table generated library. Trivial/boilerplate
  5. https://github.com/llvm/llvm-project/commit/e88f3d8d8022b265670565d72e32b8680093b5e2 - Same as above
  6. [bazel] fix #170267 (#172697) · llvm/llvm-project@a341180 · GitHub - Same as above, but a normal library instead of something table generated.
  7. [bazel] fix PR172479 for bazel (#172676) · llvm/llvm-project@a452be5 · GitHub - Adds a new dependency. Trivial/boilerplate
  8. [bazel] Port 3c97829d971d133c8984987271a31b90da64da84 · llvm/llvm-project@ea9adda · GitHub - Adds support for some new libc functions. Trivial/boilerplate
  9. [bazel] Fix for 908a5a8292ea1 (#172385) · llvm/llvm-project@bd81f41 · GitHub - Adds a new dependency. Trivial/boilerplate
  10. [bazel] One more fix for f785ca0d72cc37ac951afe81cba37c292b0027eb · llvm/llvm-project@ecfdf8c · GitHub - Same as above.

You could view all of this as renaming and reordering existing starlark. But it’s boilerplate and exists everywhere. That’s why we want to use AI in the first place, because it’s a well scoped problem with easy validation (just run the build/tests) and removes some toil from our team, freeing us up to fix actual compiler bugs/make actual improvements.

While our general AI policy should forbid this, I think it’s fine to allow this as an exception, because:

  • It only affects a peripheral tier component.
  • All changes still get reviewed by a human.
  • Reviewers for that component are (presumably) the same people who want to introduce this tooling, so there should be no additional burden on unrelated community members.
  • Constrained problem space where LLMs can likely do well.

No comment on the specific integration approach of using a Github App.

6 Likes

I generally agree with @nikic that this seems like a scenario where a policy exception could be warranted by the particulars of the proposal.

I would, however, like to propose two additional guard rails for consideration:

  • The scripts and prompts for generating these changes should be open source and visible to the community for review, and maintained in public so that future updates are reviewable as well. We need to have confidence that the community’s requirements for allowing this process are being met.
  • The process should enforce that the change only touches files under utils/bazel. This enforces that the LLM can never be introducing behavioral changes that impact non-bazel users. I believe that most of the examples shared by @boomanaiden154-1 would be fine, except for this one.

I should have been a bit more clear about that one. The bot would not be generating the entire diff in that patch, just the changes in utils/bazel to match the CMake changes. In this case the author did both directly (so the bot would not have to do anything).

Adding @keith @aaronmondal and @rupprecht as Bazel code owners.

Thanks Aiden for addressing the concerns. @nikic ‘s comment above summarized it well and aligns with our thinking. This is a well-scoped problem where we can leverage LLMs to free up our time to work on more important things. Backtesting has shown good success rate so far and is likely going to get better as LLMs improve.

I also want to point to BazelFixer data - Google Sheets which contains fixes generated by this bot for past month and a half and gives a glimpse of the kind of fixes this bot would generate. The tooling doesn’t allow the AI agent to modify anything outside of utils/bazel.

Regarding review load, running:

gh pr list --state all --limit 500 --label bazel --json labels,reviews --jq ‘. | select(.labels | length == 1) | {approvers: [.reviews | select(.state == “APPROVED”) | .author.login] | unique }’ | grep “approvers” | sort | uniq -c | sort -nr

 53 {"approvers":"rupprecht"}
 12 {"approvers":"akuegel"}
 10 {"approvers":"keith"}
  8 {"approvers":"slackito"}
  5 {"approvers":"Sterling-Augustine"}
  5 {"approvers":"jyknight"}
  4 {"approvers":"d0k"}
  3 {"approvers":"googlewalt"}
  3 {"approvers":"boomanaiden154"}
  3 {"approvers":"aaronmondal"}
  2 {"approvers":"rnk"}
  2 {"approvers":"pranavk"}
  2 {"approvers":"ingomueller-net"}
  1 {"approvers":"WillFroom"}
  1 {"approvers":"weiweichen"}
  1 {"approvers":"wecing"}
  1 {"approvers":"vonosmas"}
  1 {"approvers":"ScottTodd"}
  1 {"approvers":"makslevental"}
  1 {"approvers":"lntue"}
  1 {"approvers":"jpienaar"}
  1 {"approvers":"hokein"}
  1 {"approvers":"eddyz87"}
  1 {"approvers":"chapuni"}
  1 {"approvers":"amirBish"}
  1 {"approvers":"alinas"}

For the last 500 closed pull requests under bazel GitHub tag, over ~85% of reviews were done by few people in Google working on integrating LLVM downstream. Note that above data also includes general bazel infrastructure improvement that this bot doesn’t target. I think for just active breakages, above number I quoted is much higher.

We would still need GitHub app in LLVM org to create pull requests. And then another GitHub app in the fork to push to branches. It would be lot simpler if we could use user branches that will be promptly deleted immediately after pull request is closed or merged. I think as GitHub org admin, there’s also an option to automatically delete branches associated with a pull request when it gets merged without user clicking on ‘Delete this branch’ button. Perhaps we could do that to be sure such branches are never alive?

We are open to upstreaming these scripts once everything is setup.

I’m supportive of this. I agree that the vast majority of the changes are trivial dependency fixes and should be well within the scope of what LLMs can do today. My only concern is the over-inclusion of unnecessary changes as you mentioned. I think if we were fully utilizing dwyu we could also prune unused dependencies (which would be a net win outside of this change) which I think would alleviate that concern.

The backtesting I did didn’t include using dwyu. So you might see some unnecessary dependencies introduced in the sample data there in the link but the initial version we are going to deploy uses dwyu first to see if the breakage can be fixed without use of AI. It only uses AI if dwyu is not able to fix the problem.

As mentioned in the proposal, around half of the breakages could be solved by just dwyu. Other half however required AI-assistance.

I guess I’m suggesting rerunning dwyu after the verified fix to prune any unnecessary changes. But that’s an implementation thing that could be tuned afterwards.

1 Like

The concern regarding opening AI-generated PRs automatically vs letting the user review it before opening the PR seems valid if this was to be put additional review burden on other community members but that’s not the case here. So the difference between both these approaches diminishes and doesn’t matter in practice.