RFC: Commit Access Criteria

Introduction

I would like to restart the discussion from my previous RFC on this topic and propose some new requirements for obtaining commit access.

For those that are not aware, about a month ago, we changed our commit access request process from an email based process to one that goes through GitHub Issues. This has allowed us to better track who is requesting commit access and gather stats about how many commits people generally have before they request access. Here are some of the numbers:

Of the 42 requests for commit access:

  • 28.6 % of requesters had 0 or less commits.
  • 40.5 % of requesters had 1 or less commits.
  • 47.6 % of requesters had 2 or less commits.
  • 66.7 % of requesters had 3 or less commits.
  • 71.4 % of requesters had 4 or less commits.
  • 73.8 % of requesters had 5 or less commits.

21.4 % requested commit access before even submitting a PR.

As one of the people responsible for granting commit access, I’ve become more and more uncomfortable with granting commit access to users that have little to no interaction with the project, and I think we need to make a change.

For further background, please see my talk or slides from the 2024 US Developer’s Conference on the benefits of having more stringent requirements for commit access.

Proposal

I would like to propose that we require contributors have at least 3 commits and acks from 2 current committers in order to be granted commit access for the project.

Based on the stats above more than half of the users in the past month would meet the commit requirements and all of them had PRs committed by at least 2 unique users who would be able to provide the necessary Acks. I think these requirements strike a good balance between being friendly to new contributors and also ensuring that we do not just hand out commit access to anyone.

It’s important also to remember that you don’t need commit access to contribute to the project. Submitting a PR does not require write access to the repo, and it’s trivial for one of the 1623 contributors who already have access to merge it for you.

I understand that no policy we have can ever keep us 100% safe, and whether the commit limit is 3, 10, or 100, a motivated attacker could find away to get access anyway. However, what this policy will do is reduce our risk of compromise and that is still important even if we aren’t ever going to be 100% safe. So, please consider this proposal and let me know what you think.

25 Likes

GitHub makes it very easy to submit changes in a form of PRs and there is no need for commit access for submissions. On the other hand, write access imposes certainly obligations from the security perspective and granting commit access to people with small or zero interaction with the project does not make any sense.

So, I would definitely certainly put some project interaction pre-requirements (any exceptions could be considered on case-by-cases basis). Maybe even being more explicit, e.g. requiring that these commits be non-trivial and / or requiring acks from “unrelated” committers (e.g. not co-workers).

1 Like

I agree that creating a PR and having someone else merge it seems straightforward enough, so in general I’m in favour of this.

My only thought on the topic beyond that is whether there would be benefit in a “fast track” option for people contributing from a company/organisation which has a proven track record within the LLVM project that bypasses (or reduces) the number of required commits. Such individuals would need an email address belonging to the organisation, or an ack from somebody who does to indicate they are indeed a member of that organisation, in good standing. In such cases, the security concerns are significantly diminished, compared to private individuals.

This seems reasonable, but one thing I’d advocate is that we’re clear that the two current committers don’t need to be the ones that reviewed/committed the previous patches (commonly they might be, but explicitly blocking on those particular reviewers might add unnecessary friction if they’re unavailable). Unless there’s some more complex implied criteria I’m missing I’d think any committer with time to eyeball the previous PRs could provide the needed thumbs up.

I’m in favor of this direction and think that the PR system is easy enough to contribute to the project. I agree that it should be very clear from the documentation that commit access is not required to open PRs.

I wonder if we can / should have some sort of labeler action that marks a PR to be submitted by the reviewer as this was not always obvious to me in the past and not every new contributor may let the reviewer know that they do not have commit access.

I tried to do this by using the review approved event, long story short, that event has read only permissions so it can’t change anything (RFC / FYI: Pull Request Greetings For New Contributors - #18 by DavidSpickett, Revert "[GitHub][workflows] Ask reviewers to merge PRs when author cannot" by DavidSpickett · Pull Request #81722 · llvm/llvm-project · GitHub).

Adding a label or comment when the PR is opened is an alternative. We already do this for other purposes, and some of the code in that reverted PR could be re-used.

I lean toward a label, since it’s always visible in the UI and you could search label:reviewer-merge and state:approved to find anything pending.

Cool to see there has already been work around this topic. I think it would be quite helpful.

This sounds totally reasonable, in fact I thought that was sort of how it worked already. As Anton mentioned, stating that the commits should be non-trivial seems reasonable, and I think the number could be higher too.

For comparison, WebKit’s process says:

Normally a potential Committer would be nominated once they have submitted around 10-20 good patches, shown good judgment and understanding of project policies, and demonstrated good collaboration skills.

and Chromium’s:

To become a committer, you must get at least ten non-trivial patches merged to the Chromium src Git repository, and get an existing committer to nominate you. You will need at least two other committers to support the nomination, so getting at least three different people to review your patches is a good idea.

I don’t see why LLVM would want to be less strict, really?

People in such organizations already benefit from colleagues who can help ramp them up, review patches, etc. to meet the regular requirements. I don’t think there should be a fast track.

6 Likes

I am also in favor of at least 5 but more inline with chromium/webkit of 10 seems more appropriate to me.

5 Likes

I agree. This forms an “internal fast track” already.

Of course it’s not very equitable, for anyone not in a company. Perhaps we can help with that by stating in the policy whether it is allowed to, and how to, reach out to users who might consider your commit access request.

Otherwise it’s a bit like reviewers, you submit a PR and…do you add people, do you tag people or do you just wait for someone to show up? We have improved that guidance over time too.

FWIW, I’ve always felt a bit odd reviewing PRs from team-mates, as it feels a bit like marking your own homework assignment. It’s hard to remove your team’s motivation from influencing what is good for the LLVM community as a whole. Indeed, in the past my immediate colleagues have actively discouraged reviewing each others patches upstream, though I acknowledge that doesn’t seem to be the case for others. Anyway, my point is that being part of a company does not necessarily equate to faster reviewers, though hopefully existing colleagues can help with the other aspects of ramping up at least.

4 Likes

Thanks for the proposal! I think this is a positive change.

The proposed bar is quite low (and I’d be okay with something slightly higher as well) and I think it’s reasonable to expect that people who request commit access have made a handful of contributions. Explicitly documenting a threshold also helps newcomers by reducing the guesswork on when it is appropriate to request commit access.

I think it would also be good to explicitly document that it’s possible to request triage access as well, in which case we’d presumably waive the requirement for the number of commits (but not for the acks).

Not really on-topic for the thread, but I do notice this problem occasionally. I think ultimately it shouldn’t matter whether the PR author and approver are from the same company – what matters is that the approver is qualified to review code in that area. For example, for PRs to the X backend, chances are good the author and approver both work at XCorp – and that’s totally fine, they’re the domain experts! But if someone from XCorp submits a PR to InstCombine and someone else from XCorp approves it five minutes later, even though both of them normally only work on the backend and clearly aren’t even aware of the documented requirements for contributions to the component – that’s pretty annoying.

2 Likes

This sounds like a very reasonable approach to me, I support it. Thank you for putting together the data!

+1 overall, thanks!

+1 I also suggest that we require the qualifying patches to be bit more involved than e.g. fixing typos.

What about reviews? Folks could also demonstrate their good understanding of LLVM through comprehensive reviews.

-Andrzej

What constitutes “non-trivial”? It’s clear that typo fixes don’t qualify, but I think it would be good to get more agreement about what is and isn’t trivial before making this a requirement.

1 Like

This sounds like a totally reasonable course of action for the project, and one I think we should absolutely take. Thanks for making the proposal @tstellar.

I agree with @hansw2000 and others that perhaps more than 3 commits is a better metric for commit access.

I also don’t think we should have a “fast track”, given that the nature of your work at your company will probably have you meeting the minimums quite soon.

I’m going to guess we’ll have a hard time defining precisely what a non-trivial commit is, given even a simple one line fix could have required a great deal of work (e.g., bisecting llvm revisions, bisecting the pass pipeline, understanding the transform well enough to spot a mistake in its implementation/correctness/etc). So perhaps its better to be a bit vague in this case, and only specifically call out some things (like typos) that are easy to agree wouldn’t qualify?

Do we really need to try to codify these things? My thought is that any definition here could contain loop holes. Can we switch to self-judgement? E.g. in the commit request one could state that:

  • Read and agree to developer policy
  • Contributed to at least (5-10-15) non-trivial patches
1 Like

I’m in favour of this change. I would also be fine with bumping the minimum up to 5 or 10 commits, but 3 is fine too. I don’t think we need to worry about “trivial” commits being counted so long as we specify that the two current contributors are saying “I believe this person should have commit access” and not just “I checked that this person has X commits”.

I think that’s basically what I was saying. Sorry if that wasn’t clear.

I like that this is data driven, it is helpful to know what the numbers look like.

I think I would even be in favor of making the bar five but I think what is proposed is reasonable.