Changes to Pull Request Subscription System

Hi,

We’ve been using the CODEOWNERS file to allow community members to ‘subscribe’ to pull requests based on the file paths that are being modified. This has been working for the most part, but we have uncovered some issues with our system, and I think we should re-consider what we are doing.

The three main issues I see so far are:

  1. There is no distinction between a patch subscriber and a patch reviewer. Some people just want to be notified about pull requests and don’t want to or don’t have the authority to approve pull requests. Some sub-projects want to have a definitive list of reviewers who are allowed to approve a patch. Right now there is only one ‘pr-subscriber’ team, so both these groups of people are treated the same.

  2. We are limited to using file paths for subscriptions. Using something else, like labels, to drive subscriptions gives us a lot more flexibility to add subscribers. We can subscribe people based on keywords or even just by manually adding a label.

  3. We are not using the CODEOWNERS/Reviewers field the way they were intended. This could potentially be risky for us if changes are made to this feature that don’t consider our uncommon workflow. Using something like labels to drive subscriptions is going to be much safer, given that GitHub already natively supports subscribing to issue labels and it’s likely that something like this would be implemented for Pull Requests (especially since according the REST API, Pull Requests are Issues).

Why did we use the CODEOWNERS in the first place?

The main reason for using the CODEOWNERS file is that it allows you to be added as a reviewer right when the pull request is first created. This generates a notification for reviewers that shows the full patch diff of the pull request. Any other notification system, like one based on labels for example, would still notify subscribers almost as quickly but the first notification would not contain the full patch diff, only an html link to the Pull Request on GitHub.
The other reason to use CODEOWNERS is it doesn’t require us to write our own custom GitHub action to manage the subscriptions everything is automatically handled by the GitHub backend.

Proposed Changes

I think we should delete the CODEOWNERS file, start over, and use it only for people who are actually code owners, or who have the authority to approve commits for a specific area of the project. At the same time, we would start using labels to subscribe the existing ‘pr-subscriber’ teams to new pull requests. The main downside of doing this would be, as mentioned above, that members of the pr-subscriber teams will miss out on the full patch diff in the initial notification email. However, I think this change will sufficiently address the three issues mentioned above.

8 Likes

Do you have an example of the email notification we would get?
To me it is just critical to have enough context in the subscription email to not have to open the webpage for every single email notification to figure out if it was relevant or not.

I think the notification would look like this:

@llvm/issue-subscribers-infrastructure

Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.

I definitely think the CODEOWNERS approach isn’t quite doing what we want, so I’m open to this idea but I have a few questions.

There are a few things that I think are important in terms of both subscribing to and requesting reviews:

  1. Notifications should have enough context to be actionable
  2. It should be easy to find the list of things that need acting upon
  3. People should be able to configure their own preferences
  4. It should usually be straightforward to find appropriate reviewers
  5. Newcomers shouldn’t have to read all of our docs to have a hope of getting something reviewed

So if I understand correctly, this is a slight downgrade on (1), in that the emails won’t necessarily have the full diff. This is definitely a concern.

For (2), the current system seems to put things in https://github.com/pulls/review-requested. Based on our conversation on discord it sounds like we’d get a similar list in https://github.com/pulls/mentioned with the new idea, so it shouldn’t really make a difference.

This seems somewhat neutral on (3) I think, but I don’t really know how the labeling works. With the CODEOWNERS approach we can create a PR to update what’s relevant to a “team” - how does this work with labels? It sounds like we might, in the future, have a github feature where folks could individually subscribe to labels, but for now it would still be something we manage through the various teams, is that right?

As (4) goes, I think CODEOWNERS does okay, but as you mention it’s really only path based. Is it safe to assume that our abilities to automatically label things are more powerful here?

For (5), the CODEOWNERS approach is a mixed bag. As far as I can tell from the questions people have been asking, it really isn’t obvious whether or not anyone’s assigned to review a PR if you don’t have commit access. Someone on the discord was saying all they could see was that the PR was “assigned to a team”. Will the label based approach make this better / the same / worse? It’s maybe a stretch goal, but is there a path here to being able to remove the bit in our docs that basically says “Look at the git log of the file and hope that those people are good reviewers!”?

As always, thanks for working on all of this

2 Likes

If we moved to a label based system, then you would update this file instead of CODEOWNERS.

Yes.

I think the problem you are referring to about not being able to see reviewers is a bug in how we set up the teams. There is also an option to assign individuals from the team, rather than team itself, so we could do that instead. I think overall, CODEOWNERS, has the potential to be really helpful for finding reviewers, but probably not if we use it as the subscription mechanism like we do now.

2 Likes

Another downside of our current approach with CODEOWNERS is that every who subscribes to a PR must have commit access to the repo. I also think we may be able to support self-subscription if we did notifications based on labels, but I need to test out my hypothesis to confirm.

GitHub doesn’t even give you that, anyway. It gives you the per-file diff stats when the PR is opened and links to the diff, and it gives you the PR’s description, but it doesn’t give you the diff itself like Phabricator does. When new commits are added it doesn’t even email you the diff stats, just the --oneline log with links to the commits. So the only real thing of value being lost is the PR description, which one could rectify by quoting when tagging the various subscriber teams (this might also be handy for issues), though if tagging multiple teams that could get quite noisy.

Thanks for correcting this. You are right the initial notification just shows the PR summary with the files changed. And I really like your idea about adding in additional information when tagging the subscriber teams. We could even potentially add in a diff of the PR when tagging the team too. Although, I think that would only be practical if there was some way to hide the diff from the GitHub PR UI, but still have it show up in the notification.

Another issue I’ve observed with the current approach is that when someone in the group approves the PR, it is approved “on behalf of ” and the group is removed from the reviewers and replaced with the person approving it [1]. I don’t know if the group still gets notifications afterwards, but it made it really confusing to see the PR disappear after being approved.

[1] I observed this here: [lldb][Docs] Additions to debuging LLDB page by DavidSpickett · Pull Request #65635 · llvm/llvm-project · GitHub

1 Like

Create the full comment then immediately edit it to remove everything but the @? Yes it’s exceptionally stupid, but…

I’ve discovered that it’s possible to hide comments, so we could encode extra information when we tag a team as @jrtc27 has suggested.

With this, I think we can mitigate the biggest downside of the label based system which is the lack of information in the initial notification. I’m going to try to implement my proposal and submit a pull request. Please let me know if you see any other potential issues with this.

3 Likes

The <details> tag should work in comments: Organizing information with collapsed sections - GitHub Docs

3 Likes

Oh that’s even better, provided one can relatively easily escape <details> and <summary> tags that appear in the PR’s description when quoting them.

Thank you for creating workflows: Add a simple pull request subscription workflow by tstellar · Pull Request #64913 · llvm/llvm-project · GitHub

I agree that it will be much better than the current subscription-via-CODEOWNERS mechanism.

1 Like

Status update:

The label based notification system has been implemented now and is up and running. The last thing left to do is to make sure the team names match the label names. This has been done for some teams, but not all of them, so you may notice that new labels are generating mentions to some teams and not others. Once this last step is complete we will have something that can replace the CODEOWNERS file.

If everything goes well, we should be ready drop the CODEOWNERS file. I know some people may still want to use this file to help assign reviewers to PRs, but I think we should delete the existing file and start over with a discussion about how best to use it.

There are still a few known issues with the subscription system that we need to address:

  • Sometimes people mistakenly create pull requests with hundreds of commits that include changes already in main. This leads to a lot of notification spam as many teams end up being subscribed to these PRs.

  • Our bot user has been getting rate limited by GitHub. I haven’t seen this affect our subscription system yet, but it has affected the emails going to the commits list.

Thanks to everyone that has been helping with patches and suggestions, we made good progress this week.

7 Likes

This is definitely annoying! One thing I noticed though is that every team was mentioned in a separate email. Is this really necessary? Whilst obviously the “oops I subscribed all the teams because I forgot to rebase” is hopefully not a long term issue, it isn’t ideal that every team who SHOULD be mentioned will get so in a separate email too, as it’s still quite noisy.

@jh7370 Yes, I think combining the mentions into a single comment would be better. I’m going to add this to the TODO list.

1 Like
1 Like

I don’t mind the proposal and agree with most comments in the thread already, I’m just worried what we mean by “who have the authority to approve commits for a specific area of the project”.

I’m all for the idea that some people are more suited than others to approve the patch in some areas of the code, but we have so far struggled with the ideas to create such a separation. The current review policy does not distinguish either.

So far, our loose definition of approval has been “working” for the most part and people are generally civil about not approving code that is not in their area of expertise. So if we move into something more well defined, we’ll definitely need to update the review document.

In Phab, the “definition” of “who can approve a PR” was: “anyone who has a user in Phab”, which doesn’t even need to be a committer, but it had the gate keeping that a committer needed to push the code.

In Github, with codeowners files and teams we have basically the same setup, since someone with write access still has to merge the PR after being approved, so I don’t see it as a new problem. If anything, it’s an old problem that we haven’t fixed.

The proposal to separate people in three groups per directory (code owners, approvers, reviewers) will not only add maintenance cost (keeping those lists up to date) but friction (who decides in which list you can be, per directory?).

If this is the road we’re taking, then this has to be crystal clear on the policy before we make any change to the mechanics.

To be clear, I don’t want to derail the rest of this discussion, I just want to make sure we separate the mechanical discussions in this thread from this policy discussion, that should be taken in a separate RFC with the appropriate title and description, so that people can collaborate on the topic directly.

1 Like

Status Update:

Everything is in place for the label based notifications, and it looks like it is working. If you see any issues, please let me know. I’m going to plan to remove the CODEOWNERS file later today if things are still looking good.

3 Likes