We’ve gotten some inquiries about whether LLVM should be considered to contain AI-generated content. I find the notion vaguely offensive that we would label LLVM as “containing AI generated content” because it is the product of thousands of human-years of work, but we cannot say affirmatively that contributors are not using AI tools such as copilot in the course of their work. To settle the question and provide contributors some guidance, I thought it would be good to update the developer policy.
I propose that we officially allow contributors to use AI tools to make contributions to LLVM. I posted a PR for reference, and I’ve included the text below.
AI generated contributions
Artificial intelligence systems raise many questions around copyright that have yet to be answered. Our policy on AI tools is guided by our copyright policy: contributors are responsible for ensuring that they have the right to contribute code under the terms of our license, typically meaning that either they, their employer, or their collaborators hold the copyright. Using AI tools to regenerate copyrighted material does not remove the copyright, and contributors are responsible for ensuring that such material does not appear in their contributions.
As such, the LLVM policy is that contributors are permitted to to use artificial intelligence tools to produce contributions, provided that they have the right to license that code under the project license. Contributions found to violate this policy will be removed just like any other offending contribution.
If you have thoughts on the proposed policy, this is a Request For Comments, so please respond here on Discourse and we’ll figure out the next steps.
I think we should also take a stance against turning AI hallucinations into bug reports, and possibly even Discourse/Discord posts. Someone reporting a bug should be able to explain in their own words what the bug is, and not rely on AI tools to do the thinking for them, as we saw here: Critical Security bug report · Issue #86299 · llvm/llvm-project · GitHub
I support this and the general content of the message. But I don’t think it really answers the question I think most beginners to LLVM might have: “Can I use GitHub co-pilot to write LLVM patches?”. Maybe this message could be posted with a FAQ section that answers some basic questions that makes it easier for people that want to start contribute to refer to. I think the people that might rely heavily on AI tools are people that are just starting out with programming, so aiming the them is probably a good idea.
the danger is not just bad/insecure code but malicious actors could target specific types of hallucinations and it may not be obvious to the reviewer or the person submitting the work. So these contributions should be subject to extra scrutiny.
Yeah, I think we should be quick to ban folks who use AI to generate bug reports. We had a few so far (not long after ChatGPT started becoming popular) that were giant wastes of time.
This seems difficult to enforce. I have heard of projects suggesting the use of header comments dictating which code was machine generated. However, I would find it fairly insulting as a reviewer to learn that I’m the first human to attempt reading someone’s code. At the end of the day I think it’s important to stress that contributors are responsible for the quality and content of code they submit, AI generated or not.
+1, I think this is very in line with moderating spam and banning accordingly.
To be clear, I’m essentially proposing the opposite, a liberal policy on accepting such contributions, as long as they are not clear reproductions of copyrighted material.
As I understand it, it is easy to get AI tools to reproduce copyrighted material from their training sets. That doesn’t “rinse” the copyright from the material. As a project, our normal approach is to remove barriers to contribution and liberally accept patches, subject to review, while growing the community as much as possible. This policy aims to uphold our project license while supporting that goal.
As for the attack vector from package hallucinations, this seems more like a software supply chain security attack, which I think needs to be handled in a more general way during review.
Regarding the code quality, whether it’s bad or good doesn’t feel like a sufficiently strong motivation to ban such tools outright, unless it becomes a spam concern (typos come to mind).
I’ll update the PR to include a direct FAQ answer for this question. Our current FAQ is pretty unfocused, but I expect people to find their way to the answer via a search anyway.
Just as a general observation, in my experience so far, “containing AI generated content” is an extremely broad spectrum.
Some folks attempt to have large blocks of code generated from LLMs with no or minimal human touch-ups. On the other end of the spectrum, GitHub Copilot functions more like a glorified autocomplete most of the time. Using it this way results in code that “contains AI generated content” in a strict sense, but it’s really human-authored code where sequences of tokens were taken from an LLM in a manner that is not so different from using more traditional autocompletion mechanism, e.g. based on language servers.
With my code reviewer hat on, I dread the first kind of “AI generated content”, but the second kind is basically fine. And nobody has ever suggested we label code as “containing Intellisense generated content”.
I am not asking for ban, I am asking for disclosure.
If we feel like merely disclosing is a barrier that feels a bit concerning.
I think asking folks to make the determination that LLM generated code is clear of all copyright issues is a pretty high bar. OTOH disclosing one used an LLM to generate code (partially or fully) feels like a pretty low bar.
I am aware that some folks maybe be using LLMs based tools as glorified “Intellisense” but the more worrying pieces I have read clearly indicate that folks are regularly using them to generate significant portions of code and these are the scenarios we should be guided by.
I think this is implicitly the case due to the number of LLVM Foundation board members who have supported this, but it feels like support or at least lack of overriding concerns for any policy in this area from the Foundation (presumably based on legal advice of some sort) should be explicit.
It is actually not a low bar.
Unless each line/block of AI generated code is annotated in the source code, we can not maintain the disclosure header in a fast moving code base as ours. Without the annotation how will author/reviewer of a subsequent patch who by chance is removing all the AI-generated lines from a file, will know that they are also responsible for deleting the disclosure header?
And annotating (something like “special comment tags”) would be pretty difficult and annoying.
Really appreciate @rnk for taking up this issue. I was waiting for this clarification for quite a while.
Apologies, I guess I am using the wrong term. I meant disclose in the summary of the PR merely for reviewers to take into consideration. I did not mean actually annotating the code itself.
I’m a bit concerned about one aspect of this. Tools like co-pilot allows people to “write” lots of code without necessarily understanding what the code does in detail. When people write code, it usually follows some logical structure, often there are comments clarifying some parts, etc.
When tracking down a bug in unfamiliar code, understanding the intent behind the implementation is very important to understand where the bug originates. Recovering the intent from the source code is not often easy, but in a human-written code, at least in LLVM, it’s not an impossible task.
My concern is that with AI-generated code it my get to the point where understanding the code, specifically why it’s written the way it is, may become a lot harder. If there is some misbehavior in it, it may be very difficult to differentiate between hidden assumptions, and bugs.
I’m guessing that people experienced with LLVM will be less likely to use AI tools than people who are new to it. What are going to do if we get a PR with 2000 lines of AI-generated code from someone who can answer some questions, but does not have in-depth understanding of all details? Is anyone going to volunteer to review it in detail? Will it get approved because it appears to work, or will it stay open forever?
Personally, I want to support the AI-as-glorified-intellisense use case. As everyone has pointed out, nobody wants to review and find bugs in 2000 lines of “plausible” looking code that no human ever had to write or understand. The chance that it contains lingering bugs that will live on for years in the bottom of the compiler tech stack make it not worth it. And AIs are good at generating code that looks like normal code, passes limited test cases, but contains edge case bugs that are hard to spot.
My hope is that we treat these kind of contributions the same way that we would any other large, low-quality contribution: decline to review it or merge it and ask for smaller patches. People already have large existing forks of LLVM of dubious code quality that they would like to upstream, but we try to maintain a high quality bar to avoid growing the project maintenance burden.
I think the LLVM Foundation’s main interest here is to avoid extra work and bureaucracy. I don’t like the idea of labelling LLVM as containing AI generated content, but our license already says that LLVM is not fit for any particular purpose as a way to disclaim any implied warranty. This is kind of an insult to LLVM, which is deeply embedded in the foundation of the global industrial compute technology stack, but we say it anyway to avoid legal liability and facilitate contribution. We’ve all gotten used to this contradiction over the years, and I think AI content labelling will, unfortunately, end up creating a similar situation, where we disclaim things as potentially containing AI content because they happened to use Photoshop or copilot.
There may be another route here where we create some fine lines and ask contributors to attest that they did not use AI tools in certain ways, but I think nailing down those details will take a fair amount of work. It might be worth it. I think what I’ve currently proposed is an intentional step down the path of least resistance, and I just wanted to take that step knowingly, rather than defaulting into it as AI tool usage proliferates.
As someone who does a lot of code reviews, I am not worried if someone uses AI to help them word a commit message or comment because of language barrier issues, or to help them make a two-line (or similarly easy-to-review) change in a PR. However, I care deeply if someone uses an AI to generate large portions a sizeable PR because that really impacts how I have to approach the review (and, frankly, how I prioritize that review in relation to other reviews).
I don’t know how to formulate that as a policy, though. What’s sizeable? What percentage of the changes are AI-generated? Where is the AI being used? The crux of it for me is: if the use of AI doesn’t negatively impact the review process for a PR, then it isn’t really a problem. But if it wastes our limited reviewer resources, then it is a significant problem (worthy of a ban) in my opinion.
It puts the onus on reviewers to determine whether code has copyright issues or not. e.g., if we are in a review and suspect there’s AI-generated code involved, the reviewer is put in the awkward position of having to ask “did you write this yourself?” without a realistic way to verify or refute whatever answer the author gives. This seems ripe for unintentional bias in making those determinations and increases the risk of copyright issues for the community.
It would make me more comfortable to ban AI-generated code while allowing AI-generated non-code (comments in a PR, commit messages, etc), and once we have a better handle on how well that works out, we can consider relaxing the restrictions to allow for AI-generated code.
I think we agree that we don’t want to waste reviewer resources on low-quality, raw, AI generated code. Maintainer attention is our most precious, scarce resource, at the end of the day.
What do folks think about revising the policy to strongly discourage contributions containing significant quantities of unreviewed, AI-generated code? We want to communicate that reviewers are not going to do your homework for you. They aren’t going to exhaustively review large volumes of code for style and correctness. Our expectation is that contributions meet a certain quality bar, and we expect reviewers to ignore patches from contributors who historically have failed to meet that quality bar.
The idea is that I should be able to ask a tool to “write a depth first traversal over the CXXRecordDecl subobjects” to generate some starter code that recursively visits all bases and fields in a class, but as a contributor, I need to take responsibility for finding defects in that code. If contributors don’t meet that quality bar, it harms their reputation, and future contributions will be ignored.
In other words, we will have a permissive policy on tool use, but we will encourage contributors to exercise good judgement.
Why though? I would approach all reviews the same really: is the code readable, maintainable, and making sense for the components. Is there testing coverage, etc.
Everything I look at for a code contribution is independent of how the code was produced!
I don’t quite understand why or what is special about AI here? In any review for as long as the project exists, you could have had submission from folks who copied large portion of code from other projects. Did you feel it was your responsibility to handle this when reviewing?
Is this practical though? AI is really part of most people development, and this has more chance of just being ignored (and unnoticed) than anything!