Usage of CODEOWNERS file

GitHub supports a CODEOWNERS file (About code owners - GitHub Docs), which can be used to automatically request reviews from certain people/teams when certain files are modified.

There was an aborted attempt to make use of this functionality as a subscription system (see Changes to Pull Request Subscription System), which has been replaced by a different mechanism. I’d like to discuss the usage of this file for automatically requesting reviews.

In particular, I would like to propose making this file self-service: Anyone should feel free to use CODEOWNERS to add themselves as a reviewer for certain paths/files, without requiring additional approval, similar to how it worked with Herald rules on Phabricator. The only expectation is that if you add yourself to CODEOWNERS, you will actually review a large fraction of pull requests affecting those paths in a timely manner.

Being part of CODEOWNERS is only used to automatically assign reviewers, but does not confer any additional authority beyond that (e.g. there is no requirement that a code owner approves a change).

I think the more people we have in CODEOWNERS, the better, as this ensures that PRs submitted by newcomers get assigned an appropriate reviewer and don’t fall through the cracks, with nobody feeling responsible for looking at them.

It is worth noting that we also have existing CODE_OWNERS.txt files in some subprojects – I think we should make the GitHub CODEOWNERS files independent of those (and longer term retire them entirely), as at least the llvm/ CODE_OWERNERS.txt file is completely outdated and has practically no basis in reality at this point. It is my hope that the GitHub CODEOWNERS will remain more accurate over time, because people are incentivized to remove themselves from that file if they stop working on a code area (to prevent getting auto-assigned on reviews).

9 Likes

There is a maximum size of 3MB for the CODEOWNER file, have you tried to see how much this would potentially grow if we make it self service instead of using teams?

Is it possible to use other configure file and write a script to assign reviewers? One concern is that the meaning of the proposed usage for CODEOWNERS is much different from what general developers feels. So I worry there may be some misunderstandings.

There are currently 3000 people in the LLVM org. If every single one wanted to add themselves to CODEOWNERS, they would have 1KB of space each, on average. To give an example, in Add myself to CODEOWNERS by nikic · Pull Request #66363 · llvm/llvm-project · GitHub my own additions are <0.5KB (though I would likely expand this if it works well).

In practice, I expect that only a tiny fraction of LLVM org members would actually want to be part of CODEOWNERS, so I don’t think we’re going to run into issues with the size limit any time soon.

I do think we should prefer using reviewer teams in cases where this is possible – e.g. I imagine that this could work for some backends. But for many parts of the LLVM middle-end at least, I don’t think teams would work well (e.g. having a reviewer team for all of llvm/lib/Analysis or all of llvm/lib/Transforms doesn’t make sense – this needs to be at the level of individual files, and I don’t think adding a reviewer team for hundreds of LLVM passes improves things over listing individuals in CODEOWNERS).

3 Likes

+1 from me. I am responsible/have state for various bits of MLIR machinery and it would help me to prioritize reviews/feedback/etc to be able to opt-in to such a thing.

Github’s UI for this is not really aligned with this expectation. In particular, when a “Code owner” approves a PR that touches files they “own” (according to CODEOWNERS at least), the UI will put a big green check mark saying “Joe X. Developer approved on behalf of <codeownersgroup>”.

I’m not saying it can’t work BTW, but I think it would be easier to follow the paths that the UI steers people towards.

As an alternative: Anyone is able to request membership in a given pr-subscribers-* group, and start reviewing based on the notifications they receive from that. Once that contributor is a trusted member of the community, they can of course be added to the codeowners file.

I’m all for up-to-date codeowners files, as well as fresh blood where necessary, but I think the bar for that file should be closer to “knowledgeable project member” rather than “any motivated newcomer”.

1 Like

Here is how a normal approval looks like:

image

Here is how an approval on behalf of a reviewer group looks like:

You get the big green check mark either way. And I don’t think code owners change anything about this formatting (the “on behalf” wording comes from the fact that a team was assigned as reviewer, not the fact that the team was assigned via the code owners mechanism.)

So I don’t think the UI causes any undue confusion here?

It depends whether the respective user is already a committer – I interpreted

as also including non-committers (perhaps wrongly?). I don’t know what Github would do with a “Code owner” that’s not a committer - probably a warning that the CODEOWNERS file is invalid…?

Yes, code owners need to have write access to the repository. From GitHub docs:

The people you choose as code owners must have write permissions for the repository. When the code owner is a team, that team must be visible and it must have write permissions, even if all the individual members of the team already have write permissions directly, through organization membership, or through another team membership.

Yes, code owners need to have write access to the repository.

Tangentially, it is unfortunate about GitHub PRs that a lot of things are not possible without write access. In Phabricator, I would manually add reviewers that seemed appropriate, knowledgeable, etc. While in GitHub “To assign a reviewer to a pull request, you will need write access to the repository.” (Of course, all the more reason why it is important to automatically get appropriate reviewers when touching specific files.)

1 Like

At least clang and LLDB recently updated their code owners. For LLDB, updating this was partially motivated by the GitHub transition.

I’m personally in favor of the self-add mechanism suggested here, as it mimics functionality that existed with Phabricator. I’m not sure if it should replace the existing code ownership model. LLVM’s code ownership model only requires that “a commit to their area of the code is appropriately reviewed, either by themself or by someone else”. I think that could be a reasonable expectation for adding yourself to the file. On the other hand, while there is no official policy for electing code owners, there usually was a good level of community consensus, which we would lose with self-adding.

Rust uses a bot to workaround review assignment restrictions (as well as issue labeling restrictions), see Triagebot - Rust Forge

You just put “r? UserName” in a comment, and the bot will add the person to the review list on your behalf.

Having a bot that picks up comments to assign reviewers is a cool suggestion, I gave it a try but there does not seem to be such a bot for llvm.

I’m tagging @tstellar to pass on the suggestion.

I think I mostly agree with the other comments on this thread: The file should be treated as a volunteer list for reviewers, so if you add yourself you are essentially volunteering to review code for a particular area.

(hopefully whatever conclusion’s reached from this thread can be written up in the “how to github” or similar LLVM docs - I’m having trouble tracking all the variations on how to do things, and evolving situation - I’d love to be able to check one place especially after some of this dust has settled)

3 Likes

I think that in LLVM world, there is a difference between reviewer and code owner. Anyone can be a reviewer, but code owner has specific meaning. By changing this meaning and using the file, we need to rethink the definitions of these words or come up with a way to work within the constraints of the GitHub code owner file system.

2 Likes

I know it’s potentially confusing to people if the file is called CODEOWNERS, but it doesn’t actually list the real code owners for the project, However, it is providing a really useful functionality, and I don’t want to miss out on that or delay using it because of the name confusion.

I wonder if we could make the CODEOWNERS file symlink to a another file called REVIEWERS, that might help with some of the potential confusion. But if not, I think adding documentation to the CODEOWNERS file and to our developer docs explaining exactly what the file is used for will be good enough for us to start using it.

Yes, I agree that documenting these files is the bridge step here until we can really sort this out better.

That doesn’t really work on Windows… and the file name would still be in use.

This is the main objection I see to this proposal. Should we require any additions to the CODEOWNER file be approved by the actual code owner for that area?