Code Review Process Update

Thank you everyone for providing feedback on the Code Review RFC. We received 190 comments on the GitHub issue, which are summarized in this document.

The LLVM Foundation Board of Directors reviewed the feedback from the community and while we agree with many concerns regarding missing features and adjustments to make, we believe that the LLVM Project should move to GitHub Pull Requests for code review. We have set the transition date to October 1, 2023 in order to provide adequate time to address some of the concerns and provide the community with enough time to transition.

Before we get into some of the rationale for that decision we would like to acknowledge and say thank you to Google and all the Google engineers that have worked on maintaining Phabricator for the past 10 years. Phabricator is a critical part of the LLVM project and has really allowed the project to grow in ways that weren’t possible with the original email based code review process.

Unfortunately, Phabricator is reaching “end of life”, and we need a new solution. Phabricator is no longer maintained upstream, and it is breaking down as the software ecosystem continues to evolve (e.g. with new versions of PHP). Furthermore, as the LLVM Project continues to grow, we need to continue to evaluate the tools we are using to ensure that they are able to support us for the next decade, as well as serve the next generation of LLVM developers who will join the project in the future.

After evaluating many options, GitHub Pull Requests appears to be the best code review tool for the future of the LLVM project. There are many positive qualities to GitHub Pull Requests: it is a ubiquitous tool that is familiar for new and existing contributors, it has third party hosting, it is easy to integrate with CI and other forms of automation, and it allows us to easily enforce developer policies that allow us to scale as the number of contributors to the project continues to grow.

That said, we have heard your concerns and acknowledge that GitHub Pull Requests are not an exact replacement, and does not provide all the features that Phabricator has. For example, the Phabricator user interface seems to be preferred by the community, it offers “Stack Reviews”, allows you to comment on lines outside of the changed code, and has better automatic subscriptions.

Despite the challenges, we need to get off of Phabricator, and the “must have” developer workflows are already supported by GitHub. We don’t feel that the remaining deficiencies with GitHub Pull Requests overcome benefits of moving to Pull Requests have - but we do think it is important to try to improve the workflows and know it is important to give time for auxiliary workflows to adapt. To do this, we set the target transition date to October 1, 2023. Allowing over a year should provide enough time to adapt, and time to find workarounds/solutions for at least some of the deficiencies that remain.

The next steps now for the community are to tackle some of the Recommended Action Items in the Infrastructure Working Group’s summary of the Code Review RFC. Tom Stellard will be creating a project in GitHub and some issues to track these initial action items. If you have other ideas or things that we should do before the transition, please file an issue so we can track it in the project. If you see an issue that you can help with, please get involved. The more people that help, the smoother the transition will go.

LLVM Foundation Board of Directors

22 Likes

I’d be interested in contributing to this (e.g. creating the guide to stacked pull requests), but that strongly depends on our policy on branches in the main repo. How can we get that discussion started?

  1. Check the Pull Request Migration Project to see if an issue already exists, if not create one and add it to the project. To get write access to the project you will need to join this team.
  2. Initiate a discussion. I’m not sure if it’s better to do this on the issue or in Discourse. Thoughts?
  3. Once consensus is reached document the policy and submit a patch.

After evaluating many options, GitHub Pull Requests appears to be the best code review tool for the future of the LLVM project.

In the interests of transparency, can you provide the list of other options that were considered, and the rationale to not move forward on those options? From this post and the linked documents, it seems the only alternatives considered were “continue using Phabricator” and “move to github PRs,” and I have the personal impression that one of those alternatives was already disfavored.

Despite the challenges, we need to get off of Phabricator, and the “must have” developer workflows are already supported by GitHub. We don’t feel that the remaining deficiencies with GitHub Pull Requests overcome benefits of moving to Pull Requests have - but we do think it is important to try to improve the workflows and know it is important to give time for auxiliary workflows to adapt. To do this, we set the target transition date to October 1, 2023.

Would you be willing to push the transition date back if some of the more severe deficiencies are impossible to work around by that point in time? I am strongly concerned about the impact this change will have on the future quality of the LLVM codebase, especially as the burdens of this change will be almost exclusively borne by reviewers with little to no positive features for them in the merge. Already, there are people commenting that this will cause them to stop reviewing patches, and I feel that more weight needs to given to these concerns to avoid having a deleterious effect on LLVM code quality.

6 Likes

As one of the people who will effectively be forced to review significantly less code when this switch happens, I share this concern quite strongly. I do not have any way to continue with the review load I have given that I already know from experience that performing code reviews in GitHub is significantly harder. As a code owner for multiple LLVM projects, I worry about how this will impact not only the quality of our code base, but the experience for new contributors whose reviews tend to require the most time from me, and the experience as a reviewer whose already significant review backlog will only get worse.

14 Likes

From the stated reasons, this seems like the real showstopper issue with continuing to use Phabricator. My question is, has the Board considered asking whether there would be any LLVM developers interested in maintaining our Phabricator fork at all, going forwards, including updating for things like PHP upgrades etc? From what I can tell, the primary Phabricator repo was maintained by only about 1 person for the last period of its life, but if it’s possible to do that, it seems like it wouldn’t be impossible for LLVM contributors to handle all the potential issues. Heck, potentially it could become the Phabricator source in the future for other open source communities to use, which could in turn invite further contributions, both to our Phabricator instance and potentially LLVM itself. If for whatever reason we fail to find and develop such a group of maintainers, then I guess we do have to jump ship, but I really hope that’s not the case, at least for now, since I’m one of the developers whose review quality and throughput would go significantly down long-term by moving to GitHub PRs.

7 Likes

Same here: while I agree that we need to commit to a decision, and it is nice that we have one (thank you LLVM foundation board!), I also would have preferred that the date was announced as a tentative one / a soft commitment, with a pre-condition to the move that the proposed future workflow would be documented and got through an RFC in front of the community first. A “move at this date in whatever state” is a bit unfortunate.

Also, tangentially, shouldn’t something as technical as the contribution workflow should go through a community driven RFC instead of being handled directly by the board?

10 Likes

This was of course expected, and using a proper forge does have its advantages, but I’d like to reiterate as others have how limiting GitHub’s lack of support for series of commits (“stacks”) is.

There’s also the legal issue around the Apache license. I’ve created an issue for it: Create legal clarity around Apache license clause 4b · Issue #56648 · llvm/llvm-project · GitHub. I know it’s unpleasant, but I’d really appreciate it if the Foundation took this seriously. Steering the project in a direction where contributors have to violate its own license or at least operate in a grey area of it is a bad idea.

I keep hearing this argument despite GitHub clearly supporting chaining of pull requests. Could you explain what is missing in what GH does?

Speaking for myself, not for the board: Phabricator is continuously breaking down, and while duct tape is being applied, we are on the wrong side of truth here and this is only getting worse over time. This is an infrastructure issue, which is why the foundation cares, and the point of a long transition is so we can figure this out together.

I’m not aware of credible alternative solutions that are worth considering here, but if there were a specific proposal and it met our general goals then of course it would be considered. The broad goals are something that is hosted infrastructure, that is widely used by the software dev community, that integrates with github based source control and supports the necessary features we as a community are looking for, well documented / easy enough to learn, etc.

I’ve heard and seen surprise from a number of folk along the lines of “LLVM has tons of big companies using LLVM, why can’t we afford to build our own thing (e.g. like Mozilla did)”. The former is true: massive companies use and contribute to LLVM. That said, in practice, no one has stepped up to help with things like this. Perhaps because building something /good/ (not just a demo) is a huge amount of effort.

If you think this is a project /your/ organization should be driving, then talk to your management about it: act instead of whining about other people not acting :-). Similarly, if you or your organization has connections into Github, then please use them to get improvements. This isn’t my area of expertise, but I can’t imagine that “adding comments on lines that aren’t touched by a patch” is missing due to technical reasons, it is probably more about prioritization.

One other thing I’ll add is that this thread is another indication that people who are grumpy with things often speak up, but folks that are happy with things often stay silent. I can speculate why, but I would still love people in favor of the transition to actually say so :slight_smile:

-Chris

7 Likes

This may be true now, but that doesn’t mean it will still be true in a year. We have a few options to improve the reviewer experience: One is to submit feature requests to GitHub (Pull Request feature requests for GitHub · Issue #56635 · llvm/llvm-project · GitHub) and another is to test out some of the alternative interfaces and provide feedback (Investigate alternative interfaces for GitHub Pull Requests · Issue #56641 · llvm/llvm-project · GitHub).

I’m optimistic that we’ll be able to address many of the concerns that were raised in the RFC, but we need the community to step up and do some of that work. The project for tracking Pull Request action items is live now, so if anyone sees an issue they can help with, please get involved.

1 Like

I am hugely in favour of this. I think the road for people to get into contributing is going to be way shorter. The ecosystem and help articles are much bigger and more expansive and not spending volunteer time on maintaining Phab seems like a win to me.

I have worked in numerous big orgs that have been using GitHub PRs as review tool and I don’t really see the big pain points as many here point out (not saying they don’t exist, but I haven’t seen them where I have worked). So I think we’ll be fine.

Thanks for all the work on this topic! I think this is fantastic for the community in the long run.

8 Likes

I keep hearing this argument despite GitHub clearly supporting chaining of pull requests. Could you explain what is missing in what GH does?

This is my personal experience with what GitHub provides.

First off, it is not obvious from documentation how you go about turning a sequence of local commits into a series of stacked pull requests for GitHub. The process requires coming up with several unique names, one for each commit in the series, as you need to create a branch for every commit. Now you go off and create several PRs, each of which is requesting to be merged into the branch created by the predecessor commit.

From a reviewer’s perspective, the next major hurdle is that it is not really clear that a commit is a stacked PR. You have to be eagle-eyed to notice that it is being requested to be merged into another branch that is also requesting for merge, as opposed to being merged into main. The UI that tells you what branch it’s being merged to is not prominent at all, and if you’re mostly used to looking at patches that only go into one branch, it’s easy to neglect to see that it is merging into a different branch.

Of course should you accidentally merge one of those commits, you now end up in what seems from experience to be an unrecoverable state where the diffs you are trying to look at can no longer be reliably correlated with what the changes that were actually intended in that merge commit. It’s conflating the changes in the patch in question with changes that have occurred elsewhere that get thrown in there because GitHub is diffing… something. (I personally have similar diff confusion issues when you want to do something exotic like update the patch to a newer version of main.) At this point, the only way you can really recover it is to throw out the entire PR and start over from scratch, with all of the concomitant loss of discussions.

I haven’t checked recently, but I believe it is impossible with the current system to do such advanced things as see that the first commit is part 1 of an N part series, and find the other patches in the series, or to see a complete rollup of all of the patches.

In summary, my experience is that such a feature might work well should you manage to stay on the happy path, but the happy path seems to be about as easy to traverse as tightrope in a typhoon, and the cleanup should you fall off the happy path seems to be about as easy as cleaning up after a typhoon, too.

6 Likes

FWIW, when I see someone who has made the same point I want to make, I hit the heart button or stay silent and assume that’s sufficient because the point was already raised. The more prominent the person making the comment, the less compelled I feel to reiterate the point. So I think it’s dangerous to assume that silence is acceptance.

18 Likes

40+ years of using a whole lotta different code review mechanisms tells me I will cope with whatever happens. I can definitely see managerial advantages to using the tool provided by the source hosting org. Sony has been using GHE internally for a few years, so I’m aware of its features and quirks, and I’m not opposed to LLVM moving there. (Others from Sony disagree with me, particularly @jh7370, so this is not a “corporate” vote.) In fact I find GHE a bit more natural and convenient, most of the time; so I must disagree with @jcranmer about “little to no positive features” in this move.

It is definitely a workflow change. In particular, Phab makes me upload a whole new patch each time I change something, which means (a) I’m amending the commit in my local workspace instead of doing incremental commits, or (b) I’m forgetting to diff against the right base commit. With GH I do a new commit and I’m done. I find that much more natural. Yes, I have to remember to squash at the end, but half the time I have to do that with Phab anyway. GH gives me a button on the review to do exactly that, and lets me edit the commit message before I push. It all feels much smoother to me.

GH reviews let me easily pick a single commit to look at. This is something I do a lot within Sony, mainly because I’m the primary reviewer for merge-conflict resolutions, but I could see wanting to do it for upstream reviews too. Presumably Phab would let me do something similar, although it seems like an unnatural thing to do with Phab, and I’m not sure what it would tell me; I’d have to diff the diffs to see what changed from one revision to the next. With GH it’s right there.

I’ve never put up a stacked review, and if Phab has an easy way to navigate from one element of a stack to another, I’ve never used it. I treat them all as independent. So, it’s not a feature I’m concerned about. I know others do care about it, but I suspect it’s comparatively rare for people to do big projects all at once that really benefit from a stacked review.

I don’t think GH is perfect by any means, and there are things I don’t like about its presentation, but overall it’s way more user-friendly than Phab. I’m fine with this move.

12 Likes

These are fair points. I also think that a lot of the peculiarities of the GH workflows become more natural once one realizes that GH operates on a pre-existing git repo, and that everything in GH is related to how git deals with branches. When someone comes from the view of dealing with individual patches, GH and PRs may look unusual or unnecessarily complicated, but for someone familiar with GH, it may all look normal.

I believe a lot of this is a matter of getting used to how GH operates.

Also, unless someone merges a wrong thing into the main branch, almost everything can be undone.

I’m one of those people. I really like Phabricator, so much so that back in the day, I convinced my previous employer to adopt it and helped maintain it. Yet I’m strongly in favor of transitioning to GitHub PRs. I wouldn’t want to discount anyone else’s concerns, but at least for me personally, the benefits of GitHub PRs to the projects and the community vastly outweighs the downsides for me as an individual.

1 Like

It depends on how you organize your work: if you want these tasks to be reviewed and committed independently, then I’m not aware of any way to do it in GH.
If you want to do incremental development and reviews within the same PR, you can create a task list in the PR description, where you can mark tasks as completed. It’s not the same as what you’re describing, but may still be helpful.

Github has it (publicly) roadmapped, but the ETA has been slipping, so connections would certainly help.

I normally wouldn’t speak up here as I’m not a contributor yet (though lurking for a while), but in this discussion that may actually be pertinent – as soon as LLVM moves to GH PR’s, I’d be happy to contribute little bits & pieces. Not that phabricator would be insurmountable of course, but effectively it was a large enough hurdle for me not to contribute so far (on balance with all my other priorities/tasks, I could never muster the energy to learn an EOL’d tool I’d never use elsewhere).

I also agree a lot with @pogo59’s points. To me multiple commits per PR are more natural (which IMO obviates the need of stacked PRs in the first place - if a group of changes should go together but be reviewable separately, then a single PR has a more coherent interface). Of course that ties into details about merge policies (e.g. are non-squash merges allowed), but even with only linear commits, explicitly chaining PRs and rebasing the upper ones as the lower ones get merged is not difficult.[1]


  1. It’s even possible to do that from one branch if one uses git to just push intermediate commits (rather than the branch head, i.e. git push origin <hash>:<branch-name-distinction-on-github-only>) ↩︎

1 Like

Please read the feedback, this was covered extensively.
While chaining can work well, it assumes 2 things AFAIK: 1) branches in the main repo and 2) rebase/force-push flow. This is doable but these come with tradeoffs, and when you look at it holistically it may be problematic as the advices given as answer to prevent other GitHub UI limits from many people in the feedback thread was to avoid force-pushes: so you can’t have both (AFAIK, again).
(see also the people in the current thread here saying that we should use multiple commits per PRs to incrementally address reviews)

Even considering them independent and not navigating through them, this is still problematic on GitHub (unless the 2 conditions above are fulfilled, AFAIK).

No, it really isn’t: many people who have significant day-to-day experience on both systems provided detailed feedback about why this is just not the case.

This contradicts the fact that the same points you agree with are suggesting to use incremental commits to address reviews: if you have dependent changes in separate commits then addressing the feedback of each commits with new commits in the same PR seems totally unmanageable.
Also, a lot of the stacked PRs I’ve been working with involve different components and different reviewers: using a single PR to review both is highly unergonomic for the reviewers who care only about a subset: it’s hard to keep track of the updates, changes, and the discussions of the “other commits” is irrelevant to you (example: an MLIR change dependent on a change to LLVM, where the LLVM reviewers does not know anything about MLIR).

Note: I’m personally in favor of moving to GitHub PR, please don’t misinterpret my push back on the people who are dismissive of the issues with GitHub. I see it more as matter of when/how to achieve the move and critically to design the workflow we’ll be using: it very likely possible, but tricky, to achieve a very decent workflow if we spend enough time on it (but that’ll require huge involvement from the folks who actually understand the pain points of GH…).

7 Likes