[RFC] LLVM AI tool policy: human in the loop

rnk · December 17, 2025, 7:09pm

Hey folks, I got a lot of feedback from various meetings on the proposed LLVM AI contribution policy, and I made some significant changes based on that feedback. The current draft proposal focuses on the idea of requiring a human in the loop who understands their contribution well enough to answer questions about it during review. The idea here is that contributors are not allowed to offload the work of validating LLM tool output to maintainers. I’ve mostly removed the Fedora policy in an effort to move from the vague notion of “owning the contribution” to a more explicit “contributors have to review their contributions and be prepared to answer questions about them”. Contributors should never find themselves in the position of saying “I don’t know, an LLM did it”. I felt the change here was significant, and deserved a new thread.

From an informal show of hands at the round table at the US LLVM developer meeting, most contributors (or at least the subset with the resources and interest in attending this round table in person) are interested in using LLM assistance to increase their productivity, and I really do want to enable them to do so, while also making sure we give maintainers a useful policy tool for pushing back against unwanted contributions.

I’ve updated the PR, and I’ve pasted the markdown below as well, but you can also view it on GitHub.

LLVM AI Tool Use Policy

Policy

LLVM’s policy is that contributors can use whatever tools they would like to
craft their contributions, but there must be a human in the loop.
Contributors must read and review all LLM-generated code or text before they
ask other project members to review it. The contributor is always the author
and is fully accountable for their contributions. Contributors should be
sufficiently confident that the contribution is high enough quality that asking
for a review is a good use of scarce maintainer time, and they should be able
to answer questions about their work during review.

We expect that new contributors will be less confident in their contributions,
and our guidance to them is to start with small contributions that they can
fully understand to build confidence. We aspire to be a welcoming community
that helps new contributors grow their expertise, but learning involves taking
small steps, getting feedback, and iterating. Passing maintainer feedback to an
LLM doesn’t help anyone grow, and does not sustain our community.

Contributors are expected to be transparent and label contributions that
contain substantial amounts of tool-generated content. Our policy on
labelling is intended to facilitate reviews, and not to track which parts of
LLVM are generated. Contributors should note tool usage in their pull request
description, commit message, or wherever authorship is normally indicated for
the work. For instance, use a commit message trailer like Assisted-by: . This transparency helps the community develop best practices
and understand the role of these new tools.

An important implication of this policy is that it bans agents that take action
in our digital spaces without human approval, such as the GitHub @claude
agent. Similarly, automated review tools that
publish comments without human review are not allowed. However, an opt-in
review tool that keeps a human in the loop is acceptable under this policy.
As another example, using an LLM to generate documentation, which a contributor
manually reviews for correctness, edits, and then posts as a PR, is an approved
use of tools under this policy.

This policy includes, but is not limited to, the following kinds of
contributions:

Code, usually in the form of a pull request
RFCs or design proposals
Issues or security vulnerabilities
Comments and feedback on pull requests

Extractive Contributions

The reason for our “human-in-the-loop” contribution policy is that processing
patches, PRs, RFCs, and comments to LLVM is not free – it takes a lot of
maintainer time and energy to review those contributions! Sending the
unreviewed output of an LLM to open source project maintainers extracts work
from them in the form of design and code review, so we call this kind of
contribution an “extractive contribution”.

Our golden rule is that a contribution should be worth more to the project
than the time it takes to review it. These ideas are captured by this quote
from the book Working in Public by Nadia Eghbal:

"When attention is being appropriated, producers need to weigh the costs and
benefits of the transaction. To assess whether the appropriation of attention
is net-positive, it’s useful to distinguish between extractive and
non-extractive contributions. Extractive contributions are those where the
marginal cost of reviewing and merging that contribution is greater than the
marginal benefit to the project’s producers. In the case of a code
contribution, it might be a pull request that’s too complex or unwieldy to
review, given the potential upside." -- Nadia Eghbal

Prior to the advent of LLMs, open source project maintainers would often review
any and all changes sent to the project simply because posting a change for
review was a sign of interest from a potential long-term contributor. While new
tools enable more development, it shifts effort from the implementor to the
reviewer, and our policy exists to ensure that we value and do not squander
maintainer time.

Reviewing changes from new contributors is part of growing the next generation
of contributors and sustaining the project. We want the LLVM project to be
welcoming and open to aspiring compiler engineers who are willing to invest
time and effort to learn and grow, because growing our contributor base and
recruiting new maintainers helps sustain the project over the long term. Being
open to contributions and liberally granting commit access
is a big part of how LLVM has grown and successfully been adopted all across
the industry. We therefore automatically post a greeting comment to pull
requests from new contributors and encourage maintainers to spend their time to
help new contributors learn.

Handling Violations

If a maintainer judges that a contribution is extractive (i.e. it doesn’t
comply with this policy), they should copy-paste the following response to
request changes, add the extractive label if applicable, and refrain from
further engagement:

This PR appears to be extractive, and requires additional justification for
why it is valuable enough to the project for us to review it. Please see
our developer policy on AI-generated contributions:
http://llvm.org/docs/AIToolPolicy.html

Other reviewers should use the label to prioritize their review time.

The best ways to make a change less extractive and more valuable are to reduce
its size or complexity or to increase its usefulness to the community. These
factors are impossible to weigh objectively, and our project policy leaves this
determination up to the maintainers of the project, i.e. those who are doing
the work of sustaining the project.

If a contributor responds but doesn’t make their change meaningfully less
extractive, maintainers should escalate to the relevant moderation or admin
team for the space (GitHub, Discourse, Discord, etc) to lock the conversation.

Copyright

Artificial intelligence systems raise many questions around copyright that have
yet to be answered. Our policy on AI tools is similar to our copyright policy:
Contributors are responsible for ensuring that they have the right to
contribute code under the terms of our license, typically meaning that either
they, their employer, or their collaborators hold the copyright. Using AI tools
to regenerate copyrighted material does not remove the copyright, and
contributors are responsible for ensuring that such material does not appear in
their contributions. Contributions found to violate this policy will be removed
just like any other offending contribution.

Examples

Here are some examples of contributions that demonstrate how to apply
the principles of this policy:

This PR contains a proof from Alive2, which is a strong signal of
value and correctness.
This generated documentation was reviewed for correctness by a
human before being posted.

References

Our policy was informed by experiences in other communities:

Fedora Council Policy Proposal: Policy on AI-Assisted Contributions (fetched
2025-10-01): Some of the text above was copied from the Fedora
project policy proposal, which is licensed under the Creative Commons
Attribution 4.0 International License. This link serves as attribution.
Rust draft policy on burdensome PRs
Seth Larson’s post
on slop security reports in the Python ecosystem
The METR paper Measuring the Impact of Early-2025 AI on Experienced
Open-Source Developer Productivity.
QEMU bans use of AI content generators
Slop is the new name for unwanted AI-generated content

cmtice · December 17, 2025, 7:27pm

Hi Reid,

I understand and generally agree with the sentiment that prompted this proposal, but I think maybe your current policy is too restrictive and you are throwing out the baby with the bathwater. In particular I can imagine cases where we might want to make exceptions to the general policy, e.g. for AI tools that are designed to handle a small, restricted, and easily automatable set of maintenance-type changes. I think this policy should include a well-defined path for obtaining exceptions to the general rule (that a human must be in the loop before a PR can be posted).

An example of such a path might be:

Post an RFC detailing what problem the proposed AI agent will solve and how it will solve it.
Get approval for the RFC
Have a short testing period, where humans must check their proposed changes before allowing them to be posted upstream, and must comment in the PR both that the original content came from AI, and whether or not the human needed to update the original content.
Final review by small committee (possibly one of the area leads teams) on whether or not the AI is generating acceptable quality PRs; grants the exception (or not).

Note that’s just a rough outline, and would probably need refinement. Just my 2 cents.

shafik · December 17, 2025, 8:58pm

Thank you, this feels like it take a lot of the feedback into account. one aspect that I don’t see covered here but I have had trouble with is that results from LLMs tend to be very very verbose. Often they feel like giant walls of text. There is important and useful information there but I have to put a lot more work into getting it. It is not that anything is necessarily incorrect or wrong but a human would have said in a few sentences and still conveyed the important information.

It feels very hard to push back on this because it feels to some degree subjective (but I know it when I see it) but if a serious amounts of reviews became that much more verbose it would be a large cost in reviewer time. I will feel highly unmotivated to review PRs/issues that are walls of text. Especially if (as I often am) making hard trade-offs on my time.

There is some degree on irony here, in that I am often pushing folks on PRs to provide more and more detailed information but as the meme’s often say “not like that”.

danilaml · December 18, 2025, 3:19pm

Not sure if it’s explicit from the current wording but I assume the intention for “human in the loop” is that the human won’t just forward questions to LLM and post its answers as if they are their own instead of just going “I don’t know, an LLM did it” either (because it’s essentially the same as the latter but wastes way more reviewer time).

Endill · December 18, 2025, 6:29pm

I don’t agree with this characterization. The problem we have today is not that we have a lot of LLM-based tools we desperately need to integrate into our automation. Instead, I think the problem is that reviewers struggle with a wave of LLM output coming as contributions, where contributors don’t have enough understanding of their PRs. Policy that Reid drafted in this thread is a great step towards addressing that later problem.

We’ve been months into AI policy discussions, and I don’t think that hypothetical future LLM automation is worth delaying much needed changes any longer.

Sirraide · December 18, 2025, 6:43pm

Yeah, agreed. If we eventually do find that we want to carve out some sort of exception for some tool, we can just update the policy at that point.

jrtc27 · December 18, 2025, 7:11pm

I don’t see why this is incompatible with that. A policy introduced by this RFC can be overridden by a future RFC, including one for a specific case that would like an exception.

cmtice · December 19, 2025, 4:11pm

While I am fully convinced that we will end up needing well-defined path for exceptions to this policy, which I why I brought it up, if everyone else would prefer to skip that for now, I can live with the policy going in as is. With the full expectation that we will need to update it in the future to define a principled way to obtain exceptions.

artagnon · December 23, 2025, 10:01pm

I thought some more about why crafting an AI policy for an open source project is so hard, and why so many projects are struggling with it. For a large open source project like LLVM or Linux, a large share of the contributors are working at corporations – and a large majority of corporations have adopted AI coding-assistants org-wide, many of them seeing material benefits from the adoption. It would be natural to question whether the same productivity gain can be replicated in an open source project like ours. The interest in drafting a policy may also come from the perspective of being welcoming to new contributors – a lot of young people today are playing with coding-assistants prior to joining industry. My earlier position was that it wouldn’t be useful in LLVM, from personal experience, but I’m probably old-fashioned and biased.

The core issue is that the people writing code in corporations are very different from the general public – rouge entities misusing AI in corporations can be let go, but the general public is the wild west. Several open source projects are burdened with a flood of AI-generated bug reports, and huge AI-generated PRs. I think that, due to the inherent nature of this technology, it will always be a corporate thing, and is perhaps a poor fit for an open source project – I know this is somewhat defeatist, but I really don’t know what kind of safeguards will protect us and our valuable time.

ms178 · December 26, 2025, 9:41am

True, but there is a higher argumentative and social effort needed to deviate from a once established policy.

ms178 · December 26, 2025, 9:51am

Without any empirical evidence this remains a unsubstantiated claim. I’d argue that such a LLM-assisted review could be part of the learning curve. It also does not reflect the rapid improvements in AI quality which might mitigate the issues related to bad output quality over time.

Instead of shutting the door for non-programmers with such language, I propose hard objective criteria to act as the AI quality filter, e.g. measurable and reproducable improvements (performance numbers, crash fixes etc.) that need to be explicitly mentioned within the MR/issue.

makslevental · December 27, 2025, 2:33am

This is a very strange statement - the burden of proof of the value of <X_NEW_THING> is on <X_NEW_THING> (and its proponents) not everyone else.

PragmaTwice · December 27, 2025, 3:25am

I can share a few practices from ASF communities.

Some ASF projects (e.g., Apache DataFusion and Apache Kvrocks) have already published AI-assisted contribution policies, and they converge with the LLVM proposal: if you use an LLM, you’re still expected to understand the change and be able to explain and iterate on it.

When the author can’t really explain what the LLM produced, a high-quality issue is often a better outcome than a PR. It can avoid wasted review cycles - especially for smaller communities where maintainer bandwidth and review burnout are real constraints. In projects I maintain, I’ve seen more LLM-generated PRs recently, and many stall because the author can’t effectively respond to review feedback beyond acting as an LLM “proxy”, which increases communication cost and slows everything down.

Overall, “human in the loop” feels like the most practical policy today. If LLM capability improves significantly, we can revisit the policy, but based on what I’m seeing, I think we’re not there yet.

References:

AI policies from more communities:

shafik · December 27, 2025, 6:34pm

Thank you so much for sharing your communities experience. It is good to know that we are not the only one struggling with these issues and it is also good to know how other communities are attempting to deal with them.

ms178 · December 27, 2025, 10:19pm

No, it makes total sense: As I read it, this new AI policy introduces restrictions and additional burden on contributors, effectively excluding non-programmers that want to contribute AI-assisted code changes. The burden of proof is on the proponents of such restrictions to show that there is a neccessity for introducing such a change. It’s not the other way around as there is currently no AI policy with such restrictions (e.g. everything that is not specifically forbidden is allowed).

Again, I argue that some “AI slop filter” needs to be based on objective criteria. It is not about the status of the people contributing code or their lack of programming experience but about the (minimum) quality of the code contribution to warrent further reviewer time and effort. In my view, it makes more sense to define such a set of objective criteria for the expected (minimum) code quality which need to be met than to exclude people due to the lack of programming experience.

resistor · December 28, 2025, 12:59am

It’s worth noting that the current permissiveness is quite new in the overall history of the project. The reality is that the LLVM community has discouraged automated and/or bulk changes of any form for a very long time, including changes generated by tooling as straightforward as a sed script. When such changes have been allowed in the past, it has been after receiving prior approval from the community. I view AI-generated changes as falling into precisely the same bucket.

I will restate my outlook from the earlier thread: while being inclusive and welcoming is a goal of the LLVM project, it does not supersede the goal of building a suite of compiler & toolchain components that benefit our users. Most of the time these goals are not in conflict, but (again IMO) an overly permissive AI contribution policy betrays the duty of care we have to our users.

PragmaTwice · December 28, 2025, 6:21am

At first glance, this sounds reasonable, but I don’t think it works in practice. In theory you can define “objective” standards, but determining whether a PR actually meets those standards still costs maintainer time and attention. And I strongly suspect that cost is often close to what it would take to just do a full review anyway.

Let’s also acknowledge something: in open source we generally don’t judge contributions by a contributor’s résumé or claimed experience. But interpersonal trust and publicly earned merit (see ASF’s explanation of Meritocracy and the Apache Way, for example) are genuinely important in open-source communities. That’s not because people are biased; it’s because, as imperfect as it may be, it’s the lowest-cost way to make decisions at scale. If we discard that dynamic entirely, I’m not sure open source would function as effectively as it does today.

ms178 · December 28, 2025, 8:36am

I fully understand the implications of being swamped with AI contributions on the reviewer side (AI only accelerates the problem, a ton of new novice contributors would expose essentially the same scaling problem). But this is an area were open source communities need to evolve and find new ways to 1) stay as open as possible to harness new discoveries and 2) remain functioning effectively at scale.

I’ve lobbied for an more open alternative approach publicly that aims to serve both goals (admittedly without much acceptance from the LLVM community so far):

”[…] this entire defensive model is value-destructive. It is a system designed with only one success state (accepting a perfect contribution) and one failure state (rejecting an imperfect one). It has no mechanism for a third, more productive outcome: refinement. When a contribution with a verifiable, valuable payload is rejected because its packaging is flawed, the value is not put on hold; it is permanently lost. The performance gain I found is not sitting in a queue waiting for me to learn C++; it is simply gone from the project. The fortress, in its zeal to keep out the “slop,” has also barred the door to the raw ore from which treasure might have been forged.

This is not a sustainable model in an era where the tools of discovery are becoming democratized at an explosive rate. The fundamental misunderstanding is that ownership is not a ticket that one must purchase before entering the park. Ownership is the outcome of a successful collaborative journey. By demanding it at the very beginning, we are ensuring that for a growing class of potential innovators, that journey never even begins. The fortress may remain pure, but it will also become stagnant, isolated from the very world it is meant to serve. […]

To critique the fortress is not enough. We must offer a blueprint for a better structure: a harbor. A harbor, unlike a fortress, does not have a simple binary function of letting things in or keeping them out. It is an active, intelligent system with channels, docks, workshops, and expert pilots, all designed to guide valuable cargo safely to shore, no matter the state of the vessel that carries it. This is the model we must adopt for open source in a post-AI world. The anxiety over “extractive” contributions is real, but the solution is not a higher wall; it is a smarter intake process. I propose a concrete, actionable framework for this harbor: the Contribution Triage Pipeline.”

rengolin · December 28, 2025, 3:07pm

This is categorically false. LLVM’s review policy is directly and wholeheartedly based on refinement, and the new AI policy does not goes against it.

LLVM’s policy also focus on code quality, which requires the author being able to explain the changes, and defend their choices to the wider community. Here the fail-safe is a human author being able to explain the changes without offloading that role to maintainers and the wider community.

LLMs are, by construction, unable to defend their own choices (without hallucination) with the technical rigour that we need for tools like compilers, kernels and other key infrastructure components of modern societies.

Tools of discovery should be used to find solutions in a large sea of choices, not to accept the output without validation, nor to offload the validation to the wider community.

The “value” created by some automatic tool, as well as humans, may be acceptable to some people, but not all. It may be relevant to a set of problems, but not all. It may be correct with certain assumptions, but not all. Without understanding of those limitations, contributions provide negative value, as they waste the time of more people than their contributions would provide.

Anyone willing to provide justifications and explanations for decisions made in a PR, be it created by humans or machines, are absolutely encouraged to contribute. But if their justifications are not at the level of rigour we require to merge a PR, the code should not be merged, and upon insistence without further merit, maintainers are encouraged to ignore and/or close the PRs. This is true regardless of how the code was produced.

I believe the policy offers exactly that. The “wall” is the same size as it has always been: technical rigour, honest justification, willingness to adapt.

Extractive contributions are not just dangerous due to their net-negative costs on merge, but also their accumulated extraction of whole-system knowledge and reuse of bad practices that just “happen to work”. This is a human action as well, but code review exposes refactory opportunities, which are much more costly than normal code review.

Accepting generated code without human review, and without human justification, increases the difficulty of refactory over time. This is not exclusive to machines, but it is unrestricted on machines contributions.

Until such a day when code generators can have the same technical rigour as humans (for example generate proofs of quality or trigger reasonable refactory actions), we should not allow auto-generated code to merge without human review and justification.

amara · December 30, 2025, 5:02am

No, it makes total sense: As I read it, this new AI policy introduces restrictions and additional burden on contributors, effectively excluding non-programmers that want to contribute AI-assisted code changes. The burden of proof is on the proponents of such restrictions to show that there is a neccessity for introducing such a change. It’s not the other way around as there is currently no AI policy with such restrictions (e.g. everything that is not specifically forbidden is allowed).

I really don’t agree with this. LLVM is, at its core, a compiler framework that’s responsible for the running code on the majority of the world’s computing systems. From embedded devices to servers and everything in between. Like the Linux kernel, quality and stability are IMO the highest priorities here. We don’t need more contributors who aren’t programmers to contribute code. That may suck for those people who want to feel good by having some code accepted into a prestigious project like LLVM for their resumé without learning compilers, but I couldn’t care less about those people. Those people can take their contributions to one of the other million OSS projects, LLVM will be just fine without them.

What we need more of are contributors who’ve built experience, can review code, learn and teach others through discussions. We need people with good taste that comes with years of interaction with LLVM. We need people who can look at a PR that seems to do a correct thing on a local level, but explain that it’s not the right place to be making those changes in the long term.

To sum it all up, I strongly agree with this RFC and we can evolve it as needed in future.

Topic		Replies	Views
[RFC] LLVM AI tool policy: start small, no slop LLVM Project rfc	43	4367	December 17, 2025
Our AI policy vs code of conduct and vs reality LLVM Project	66	10742	October 3, 2025
RFC: Define policy on AI tool usage in contributions LLVM Project	36	2350	June 6, 2025
Code Review Process Update LLVM Project	131	8896	February 3, 2024
RFC: Commit Access Criteria LLVM Project	85	4240	February 13, 2025