Our AI policy vs code of conduct and vs reality

Since this thread started, I filled out the LLVM dev meeting roundtable form and proposed a round table on this topic. I know not everyone participating on this thread is going, but it will be another opportunity to discuss this further in a month, take notes, and post those results.

To summarize feedback I’m getting so far:

  • Multiple folks have asked me to be more concise. I don’t want to drop the rationale, but I want to lead with concise, practical process steps that help maintainers stop slop, because we have problems right now.
  • Several folks want to ban these tools full stop, but there is a sizeable contingent that wants something moderate, but practically defensible.
  • Tie the new policy back to the general quality bar.

Here I’m using “slop” to mean “unwanted AI-generated content” as defined by Simon Willison. I’m starting to think it would be more practical to phrase this as an anti-slop policy. The word “unwanted” here does the work of capturing our subjective judgements on the value of a contribution.

LLMs are very good at corp-speak, so yes, I have heard of engineers using them to write internal emails for management. :slight_smile:

These are tough questions. We do live in a new world.

This is exactly the kind of behavior the policy draft is meant to prevent. If someone uploads slop, gets a warning, and responds to the warning with more slop, we should lock the PR.

Scott’s point is broader than what I quoted above, but I agree I would like to encourage folks to share prompts for large changes, and share the kinds of concerns. There’s tons of prior art for pasting sed scripts into commit messages used to rewrite LLVM IR test cases, or invocations of Python scripts committed to the repo indicating how mass edits were performed. Our existing developer policy on commit message writing asks the contributor to emphasize the why of the change.

I think the copyright aspect of the policy is somewhat obsolete at this point. I wrote it in an earlier era when LLMs would splat out comments from original Doom source code (or at least I had that in mind, I was probably already behind the state of the art). Most of the LLMs use recitation checkers these days to avoid obvious instances of reproducing content from their training sets, so the point is kind of moot. There are deeper questions here about copyright and IP, but you’ll have to talk to legal counsel about that.

The most important part of the policy is what the project will do when someone identifies copyright infringement, which is that we’ll remove the infringing content, and that’s a pretty unsurprising outcome.

3 Likes