Our AI policy vs code of conduct and vs reality

Speaking of communication overload, it took me over a week to read @maskray’s post here. I think this is a very thoughtful and accurate description of the way things are today, but I want to see if I can change some of this.

I can only speak for myself, my organizations, and the people I can influence, but I will continue to advocate with my employer for the view that upstream code review is a core job responsibility for LLVM contributors. I can point to numerous examples where the feature we needed (medium code model, more powerful DSE through initializes) shipped because somebody took the time to provide thoughtful design review and balance Google business needs with the needs of the project. If Google internally thinks code review is an important tool in the software excellence toolbox and the LLVM project does too, then we should recognize upstream LLVM code review as fulfilling job responsibilities in exactly the same way that 1p review is just as much of a job responsibility as writing code.

This deserves a longer post, but I take this really seriously, and I’ve been exploring standing up an instance of the CHAOSS Grimoirelab tool to track PR response time (example for Eclipse), to try to create a community initiative to drive that down together, as a group project, as a way to bring on more maintainers, not to burden the existing roster with more work.

Clearly, not all reviews are equal. Some of us have expertise, and others don’t. But I think we need some kind of coordinated plan to share and pass down that expertise if we’re going to:

  1. keep the project running and
  2. keep maintainer workloads manageable

LLVM has many excellent maintainers, but I don’t think it’s at all clear to potential new maintainers how to walk that path. It’s super opaque, and a function of who you met and when, and what you learned from various mentors. It seems to me that there is a gap here, where nobody is taking ownership of documenting that path, creating that ladder of engagement, so people can graduate from making contributions to feature X in the narrow interests of their employer to sustaining the project through design feedback on feature Y. I need to think more about how to make this happen, but the solution isn’t for existing maintainers to do more and burnout, it’s to grow more maintainers.

Other references:

4 Likes

There’s a lot to think about in this thread and it’s been on my mind quite a lot since it started. I have the following thoughts:

  • The subset of the linked Fedora policy covering contributions seems like a good starting point. For Reid’s PR I have a concern about the level of indirection: defining “extractive” contributions and a general policy on expected quality and then trying to have the AI contributor policy fall out of that is a noble effort, but it feels like it’s trying to take on a much bigger task when a more targeted policy on tool/LLM-assisted contributions could be easier to agree. Maybe it ends up being shorter, maybe not, but I hope it would be easier for contributors to relate to their own situation and easier for us to agree as a community than something broader in scope.

  • Not everyone agrees, but the policy we previously adopted and the variants most discussion have focused on allow for the use of LLMs as an aid in preparing a patch. The issue, as I see it, comes if that usage breaks our human-driven review system. I would be very unhappy and uncomfortable with a policy that controls the tools that contributors can use except as some kind of last resort. I believe attempting to refine our current policy is the sensible next step, ensuring things like people take personal individual responsibility for the work they submit and they commit to engage directly in a bidirectional review process. If someone has taken time to study code, stands by it as if it were their own, and will personally engage in the back and forth review process then I don’t mind if they wrote it from scratch, used sed, or had a room of monkeys with typewriters write a billion drafts until one worked.

  • Some ideas of behaviours we could either explicitly call out or ensure are covered by some more general wording:

    • Submitting a PR where you either don’t understand and stand by the changes, or at least don’t flag which bits you are personally unsure of. This wastes reviewer time, and LLMs being confidently wrong at times can mean that reviewers go on a wild goose chase trying to understand an assertion made in a submitted PR that a human hasn’t actually thought through.
    • It is never appropriate to engage in review discussion by taking questions from a reviewer, pasting it into an LLM, and pasting the response back. I don’t know if we’ve seen ths happen in our community, but a colleague mentioned the frustration of going through this loop in a project and I can only imagine the frustration.
    • Based on comments in this thread, it sounds like we want guidance that fishing for potential patches through LLM “audits” is unlikely to result in high quality submissions and is strongly discouraged, especially by new contributors. (Established contributors are better placed to make a judgement call on whether something is worth submitting or not, so I’m not sure I’d rule it out entirely for someone who knows what they’re doing).
  • When it comes to “disclosure”, I think the Fedora policy strikes a good balance here with “significant assistance”. Given the wide range of ways of using AI tools I think requiring disclosure for any usage is overkill (do I disclose if my search engine gave me a summary answer for my python docs query and I used that rather than clicking through?). I think something like “If your contribution has largely been generated by a tool (e.g. an LLM, custom script, clang-tidy, etc) you should flag this in your pull request as an aid to reviewers.”

  • One thing that’s been bouncing around my head is about how to make more concrete the idea that you understand and “take ownership” of what you submit. The PR description / commit message is a really important part of this, and writing it yourself seems like a potentially useful litmus test. If you find it useful perhaps you use an LLM to aid in translation from your native language, help with phrasing, or to check it that all seems fine. But I think you can still do those things and have a message that is predominantly your own work, and this would be a totally reasonable thing to require as “cost of admission” for as long as our reviews are primarily performed by human contributors.

Hey folks, I pushed an update to the draft policy in github. Wording suggestions are best tracked in the PR if you want to suggest edits.

I made two significant changes:

  1. The Fedora project policy stuff was so good, I just straight up copied it and led with that. If you can’t beat 'em, join 'em. Stay humble. Thanks @jyknight for the reference. Note that the license for the blog is CCA, so the link reference is load-bearing.
  2. To help maintainers push back more easily with more objective standards, I incorporated a guideline that new contributors should start with small changes. This is new, but it matches my experience of how one goes about climbing the ladder to becoming a project contributor.

Encouraging new contributors to start with small changes is a time-honored recommendation for getting started in a new codebase. Personally, when I do code reviews, I’m much more willing to review large changes from contributors I trust to produce high quality code. Even if I don’t agree with the change, I’m willing to invest my personal time in understanding their work because I assume it will pay off. Reviewing a large change is mentally taxing, and we should just be up front that we don’t expect project maintainers to offer the service of large-scale code review for free, just to support new project contributors. To me, this feels like a better way to address maintainer workload concerns, without spending time splitting hairs over developer process.

I retained the concept of “extractive” changes because I like the adjective, the label, and the book. Sorry if you don’t like the philosophizing, but I’m attached to it. I think it’s really helpful to have a word for this concept of nuisance contributions that is dry and technical, and it works well as a GitHub label.

I think the next step for me is to start a new formal RFC thread for this topic since this thread is long, and as we have seen in past discussions, folks tend to tune out after some number of words have been posted.

To recap the feedback and make sure I’ve addressed it:

  • Concise: I hope I’ve addressed this by keeping the actionable ideas at the top. Given the amount we’ve written on this so far, I think it deserves a long, standalone policy doc with references.
  • Full ban: I hope the size criteria for new contributors addresses this concern by giving us objective criteria.
  • Quality bar: I’ve removed the duplication and left cross-references.
1 Like

I’ll go against the grain and say that I do not agree with the referenced Fedora policy, and would not support its adoption for LLVM (though I acknowledge that Reid’s rephrasing of it is better than the original). I strongly oppose any policy that frames machine-generated contributions as encouraged practices, as opposed to something that is tolerated under certain circumstances. The wording of that policy prioritizes welcoming AI-assisted contributions over duty-of-care to the users of our software, which to me seems unacceptable.

More broadly, it is not clear to me that there is actually a consensus in this thread on what the right direction to go is. I strong opinions expressed on both sides, as well as a pretty significant rate of messages being hearted by non-participants.

Without a clear consensus on the desired policy, I don’t think we’re ready for word-smithing.

3 Likes

3 years ago, the idea that a machine could write plausible LLVM patches would have seemed fantastical. Yet here we are today having to discuss, for good reason, our position on this technology. The rate of progress has been incredible, even if it does seem like improvements are slowing down. In light of what we know is possible with these models today, and where things might be in another (short) 3 years, I don’t think having a policy of not allowing any contributions made from AI is going to be tenable long term. As these models become better and better the gap between the quality of submissions from the average developer and a machine is going to narrow. They may not reach the long term planning capabilities of experienced engineers, but I think it will only be a matter of time before having a stance of denial is going to break due to economic forces.

That’s OK - these policies don’t need to last forever. We can change them as the technology changes.

1 Like

I’m not so sure I agree - that an AI tool found a thing that no one else had found at the time doesn’t make it a win for the LLVM community - if the developer time could’ve been spent on something else, then it’s a question of opportunity cost: Was this better than the other thing they could’ve been working on? And if not better, how much worse?

Yeah, working with newcomers is going to be, in the immediate sense net-negative - but the hope is that some number of these newcomers become more involved in the project and may eventually become maintainers or at least valuable/net positive contributors. This is vital for the health of the project long term.

But if folks spend their limited bandwidth fostering AI contributions that are less likely to result in net positive contributors - we’re harming the long-term sustainability of the project. Those folks burn out and their efforts will have not been as effective at fostering an ongoing life for the project.

[my take: I think a “no AI slop” rule seems like a good start - if a bit of a judgmental term for it (some of the people creating some of it, I understand, do care about the contributions - but aren’t familiar with the concepts, etc). If the contributor is interested in learning, there’s some hope there - though even then, even if there was no AI involved, sometimes we don’t have the bandwidth to teach someone enough to be an effective contributor - there’s a lot to learn in this space & our time is limited.

Honesty seems fine, ask people to explain if they’re unsure about things, how things were generated (my first contributions to LLVM were tool assisted - sed over the codebase for the de-constification of llvm::Type - but we explain how we make those changes, include the commands in the review, etc) - and if someone doesn’t say it, maybe we get the impression it might’ve been tool assisted and ask - and expect honest answers. If people are lying about everything… there aren’t really any rules that are going to help us there anyway - other than using our best judgment to ascertain and respond to that.]

6 Likes