RFC: New criteria for commit access

Hi,

I would like to propose we add some new criteria for obtaining and retaining commit access:

This proposal has been updated: v2 v3 v4.

* A user must have 10 commits merged to main before they are granted commit access.
* A user must have at least 10 commits in the past 12 months to retain commit access (This would be enforced monthly).

I prefer having 10 be the threshold, but would not necessarily be opposed to another number.

The rationale for proposing this change is that commit access on GitHub comes with a lot of other privileges, including but not limited to:

  • Uploading releases to the release page.
  • Viewing repository secrets.
  • Running and modifying GitHub Actions workflows.
  • Publishing packages, including container images.

It’s important that we can trust people with these kinds of privileges to ensure that our project’s software supply chain is secure. I think limiting commit access to only active contributors will significantly cut down on the risk of a bad actor obtaining commit access and compromising the project in some way.

Currently, there are 1789 users with commit access to the llvm-project repo. If we were to implement this policy we would revoke commit access from 1234 people leaving us with 555 committers.

Along with this proposal, I would like to also change the way users request commit access so we can automate checking their total number of commits. Instead of having users email @clattner directly, they would file a GitHub issue with the commit-access label and we would have an GitHub Action job to verify that they were eligible. Chris would then review the ticket and give them commit access if they qualified. I think this process would be preferable to having Chris manually look up the number of commits for each request.

It’s important to balance project security with being welcoming to newcomers, but I think that commit access for new or less frequent contributors is much less necessary now that we are using GitHub pull requests. It’s very easy (just one click on the web UI) to push a commit on someone else’s behalf, which was not the case when we were using Phabricator. So, I don’t think this change will impose any significant burden on new or infrequent contributors.

(edit: Had to slightly decrease the number of committers who qualify under these rules from 574 to 555 due to a typo in my query: git shortlog -sn --branches=origin/main --after="15 Jan 2023")

12 Likes

I don’t know if 10 is the right number, or if the number should be the same for gaining access vs retaining access, but I’m totally supportive here. In particular now that with GitHub pull-request you can contribute and have authorship more directly maintained than when it used to be SVN driven.

This makes a lot of sense. A lot of discussion in #infrastructure on Discord about attack vectors has ended boiling down to attack vectors that are only enabled by a compromised account/malicious actor with commit access. Especially given the long tail of people who have not committed in quite a while, I think someone’s token/account getting compromised is definitely a real possibility and it would not be difficult to do a large amount of damage with that access, completely ignoring the risk of a malicious actor having commit access.

Having issues with a tag for commit access makes a lot of sense. Having links in the issue would be good to as there will still be some manual discretion. Small fixes like spelling errors or minor refactorings could probably be easily gamed to obtain commit access.

There are also probably a couple other measures that could be taken to increase security here. I don’t think LLVM requires committers to have 2FA by default, but that might be something to look into at some point. That’s besides the point of this proposal though.

In the distant future, if/when we have proper CI and Code Owners, we could even layer the commit access further, only granting full commit access to a small group, and instead letting everyone else (perhaps through a bot) merge only after CI is passing and after obtaining approval from a code owner.

+1. We can start from something smaller for retaining access. E.g. requiring at least a single commit over the last year or so. And then could find a sweet spot with gradually increasing.

5 Likes

+1 to the general principle, but -1 to the number for retaining access (possibly even the principle behing needing to retain access, though I’m not sure about it overall). I’m one of those infrequent committers, because I no longer work in a team that actively contributes to LLVM - I think in the past 2-3 years, I’ve made single figure numbers of commits, mostly to update things like coding standards. However, I do regularly contribute in other ways (reviewing and on Discourse). As the areas I review semi-frequently attract new contributors and I’m one of the very few contributors to actually review this code who otherwise doesn’t have a direct stake in the progress of the PR (i.e. I’m not in the same team/trying to bring up the same file format etc), it’s important for me to be able to merge other people’s changes. If I lose committ access, I’ll not be able to do that, stalling progress of these new contributors until I shout loud enough that somebody able to comes along to press the button.

As of this past week, GitHub requires 2FA for anything non-trivial, so I expect (but haven’t tried) you won’t be able to push/merge new commits without 2FA enabled. I haven’t looked into this in any more detail though. so it’s possible there are lots of caveats/different roll-out times/etc.

5 Likes

I wonder if we should incorporate comment / reviewing PRs in the retain access criterion. You can be very active in the community without landing commits in your own name.

Except for that, I agree that we should reduce the number of commiters to the repo. Just that we need to figure out a very permissive start state and then tighten it.

10 Likes

Was just thinking about this. There are definitely a couple other people who mostly review code and commit fairly infrequently. It should be possible to query other forms of activity that someone has and take those into account when removing commit access (seems like Tobias Hieta also just suggested this). Any Github event is probably pretty easy to track. Contributions on other platforms would be more effort on the technical side, but might be doable.

There are also some cases where people would benefit from being part of the repository so they can be added as reviewers and added to teams (maybe triage access?) without getting commit access, but that case is kind of besides the point here.

As of this past week, GitHub requires 2FA for anything non-trivial, so I expect (but haven’t tried) you won’t be able to push/merge new commits without 2FA enabled. I haven’t looked into this in any more detail though. so it’s possible there are lots of caveats/different roll-out times/etc.

I did not realize this. Thanks for pointing out that this isn’t a gap anymore.

I’d prefer to have a multi person verification system here, for example having Chris notified after a number of existing committers acknowledged the inclusion.

I the past, I always recommended people to copy me on the email, so that Chris would know where this is coming from.

Agree, but this just changes the threat model, not fixes it. Don’t get me wrong, I totally agree with the move (with @mehdi_amini and @akorobeynikov comments), but we need to be clear on what the new threat model is.

That is why I suggest getting support from existing committers before being granted access.

On that note, this isn’t the only topic in recent times where we’ve come up with a need to define an active contributor status. For example, the recent governance proposal had elections where only active contributors would vote. It would seem sensible for all such cases to use the same policy, whatever it ends up being.

3 Likes

This kind of “all or nothing” access does seem a bit unfortunate, but I don’t suppose there’s anything we can do about it?

For obtaining commit access, I’m not sure putting a strict limit and having a bot check it is the right way. I assume 10 typo fixes wouldn’t qualify, for instance.

Having hard thresholds like that could incentivize gaming the rules, which is not what we want.

As for retaining commit access, how about we start with periodically (after 12 months of no commits for example?) emailing inactive users and ask if they want to retain their access? That would have the benefit of checking that we also have current contact info.

4 Likes

I think this sounds reasonable but there is one big awkward issue. Currently you have to have commit rights in order to add reviewers. I don’t care much about loosing commit rights but not being able to add reviewers is very annoying, mostly for the person with commit rights that has to do this for me.

Maybe these two can be separated?

7 Likes

I think in many organisations people will move into more organisational roles that means they do more review work, with the occasional commit when they have some spare time.

I personally would lose my commit access on the yearly commit rate, which while not the end of the world, would push me further away from contributing directly if I had to go through a process to get it back.

Some suggestions for mitigations:

  • Provide a file or some other means to “register interest for maintaining commit access”. This would mean that people that have genuinely left the community can be removed. Those that are actively interested in staying can register their interest.
  • Lower the number of commits needed per year, perhaps to 3.
  • Give people a years notice from starting the policy so that they can rearrange their work to do some active work.
  • As has been mentioned before, count other community activity such as reviews and discourse posts in the 10 items.
4 Likes

I support the intent of this RFC, but would like to highlight another use case: triaging old bugs. At the moment we enjoy several newcomers digging through old bugs. Without LLVM organizations membership (which is currently tied to commit access), they can’t edit labels and existing comments (which often contain broken markup and are hard to read). At the same time, they can’t or don’t want to meet criteria for commit access. It would be nice to have a path forward for them, too. Maybe have additional ways to become a member of LLVM organization without commit access? I’m not sure how much security risk is involved, though.

3 Likes

My impression is that Mastering @rustbot - Rust Compiler Development Guide aims to solve this.

Perhaps someone here has experience using that in the Rust project.

I think this sounds reasonable but there is one big awkward issue. Currently you have to have commit rights in order to add reviewers. I don’t care much about loosing commit rights but not being able to add reviewers is very annoying, mostly for the person with commit rights that has to do this for me.

I support the intent of this RFC, but would like to highlight another use case: triaging old bugs. At the moment we enjoy several newcomers digging through old bugs. Without LLVM organizations membership (which is currently tied to commit access), they can’t edit labels and existing comments (which often contain broken markup and are hard to read). At the same time, they can’t or don’t want to meet criteria for commit access. It would be nice to have a path forward for them, too. Maybe have additional ways to become a member of LLVM organization without commit access? I’m not sure how much security risk is involved, though.

There is supposedly a triage role that allows people to do things like label issues. I believe it also allows users to request reviews on PRs, and it is explicitly enumerated as giving users permission to do things like assign issues. It doesn’t give users the ability to edit comments though. This would also solve a couple other problems like users needing to be a part of the LLVM organization to be added to teams (which has been an issue in places like the vendor teams), and outside experts in specific areas that review code but haven’t contributed much (or any) code themselves. I think this would be a reasonable compromise in a lot of ways, but it still doesn’t cover every use case. Codifying a policy on requesting a more limited access (like triage) is something that should probably be done though.

3 Likes

This probably ties in with other thoughts about contributors to the project and what counts as contributions. I wonder if contributions to the buildbot setup / maintenance are also considered?
My understanding is that with the move towards GitHub actions, that question probably becomes obsolete at some point, but in the meantime it could be another contribution to be considered.

At least the triaging/labelling part can be solved. For example, in SciPy, we have a group of members of the organisation that have rights for bug triage, but cannot yet push the merge button (nor edit comments IIRC). In practice it’s often a stepping stone between acknowledging regular contributors and actually giving them commit rights.

Another point about this: all the people in bz-contributors are org members, but don’t derive commit access from that.

I suggest to take “merge for others” into account. E.g., (Self commit + Merge for others) >= 10/year. It can solve a couple of pain points:

  • People who reviews more than commits and needs to merge for their authors;
  • Patches blocked by authorized reviewers who may not back to the patch in a short time;
  • People who want to maintain their access but don’t have something to commit;
  • Encourage authorized people merge for others;

Of course, we need a rule that the merge must happen after authors explicitly requesting for merge.

1 Like

The information is present in git already, for example:

$ git show --pretty=fuller 3a44c576b146
commit 3a44c576b146521d14f71fa50e071871623e6655
Author:     terrydang <terrydang.cn@gmail.com>
AuthorDate: Wed Apr 19 01:45:05 2023
Commit:     James Henderson <james.henderson@sony.com>
CommitDate: Wed Apr 19 01:46:29 2023

You’re the “commiter” here (the information isn’t displayed by default, see how I used --pretty=fuller here. If you’re active and merging PR, you’ll keep access.

So there does seem to be a group in the llvm organization for issue-triage. This gives access to things like labeling issues, closing issues, and I believe adding reviewers to PRs. And it does not give commit access.

However, this group seems like it is never used, in fact I am the only member, but perhaps this group, or a group with similar permissions could be used to help with some of these issues.