TL;DR: When moving to pull requests should we change our developer policy to allow non-linear history and multiple related commits in a single PR?
Problem
In moving to GitHub pull requests one of the big features we are losing from Phabricator is the ability to stack related changes. I’ve written a whole bunch of stacked reviews lately, and they have come generally in two forms:
- I wrote some complex change, broke it into pieces and posted n reviews at once.
- I wrote some small change, posted a review, before it got approved I had another dependent change.
This proposal seeks to address case 1, not case 2, which I realize is a deficiency, but bear with me for a minute…
Proposed Solution
For complex changes we could allow PRs to contain multiple commits. GitHub’s UI allows for commits that are part of a PR to be viewed individually. This puts some work on the author to be able to structure their history into a cohesive “story” that a reviewer can follow, but it does largely match current workflows with one big change:
When the change is approved it is merged in rather than rebased.
Benefit
One advantage of a merge is that the child commits don’t need to be fully functional, but the history on main
remains functional.
For example: a change that alters an LLVM API could be introduced in a PR where one commit is the change just in LLVM with any expected test, documentation, etc. Subsequent commits on the PR could update the usage of the API across the repo covering the sub-projects.
Today when we make changes like this we often introduce the new API separately, then go off and clean up the uses, then finally remove the old one. Being able to do all of this in one PR would simplify the workflow.
Git Linear Histories (Revisited)
Before I go run off and become a pariah, I think a lot has changed since we last had this discussion, and I’m curious if in light of moving to GitHub PRs and improvements to git tools we should have this conversation again.
Back in 2019 when we were planning the move to GitHub we had a thread on whether or not we should maintain a linear history or allow merge commits ([Github] RFC: linear history vs merge commits).
Re-reading that thread the main technical points were:
- More reliable
git bisect
because there are no intermediate revisions - Cleaner
git log
output (also because there are no intermediate revisions)
Since our transition to GitHub, git has added the --first-parent
flag to both git bisect
and git log
, and many git GUI tools have been updated to support first-parent filtering.
For those who are unaware, git’s first-parent option ignores commits that are attached to a merged in branch and not part of the branch you’re running against. This has the effect of allowing bisect and log to only operate on the merge commits or individual commits on a branch, and not walk down the trees.
I believe this feature in git addresses the technical concerns, but let me know if I’ve missed something.
LLVM Developer Policy
LLVM project policy for incoming code contributions is a preference to small isolated changes. There are advantages to small commits, like increased contextual information in blame output, easier regression tracking, etc.
If we allowed merges of related changes, we keep those benefits. We keep more robust blame information, we keep the ability to bisect main to a single commit that introduces issues. We have a single commit that can be reverted.
Altering our developer policy to allow features to be merged in would enable a review workflow where large changes could be reviewed both as their individual contributing parts and as a whole.