[RFC] Revisiting linear history vs merge commits

TL;DR: When moving to pull requests should we change our developer policy to allow non-linear history and multiple related commits in a single PR?

Problem

In moving to GitHub pull requests one of the big features we are losing from Phabricator is the ability to stack related changes. I’ve written a whole bunch of stacked reviews lately, and they have come generally in two forms:

  1. I wrote some complex change, broke it into pieces and posted n reviews at once.
  2. I wrote some small change, posted a review, before it got approved I had another dependent change.

This proposal seeks to address case 1, not case 2, which I realize is a deficiency, but bear with me for a minute…

Proposed Solution

For complex changes we could allow PRs to contain multiple commits. GitHub’s UI allows for commits that are part of a PR to be viewed individually. This puts some work on the author to be able to structure their history into a cohesive “story” that a reviewer can follow, but it does largely match current workflows with one big change:

When the change is approved it is merged in rather than rebased.

Benefit

One advantage of a merge is that the child commits don’t need to be fully functional, but the history on main remains functional.

For example: a change that alters an LLVM API could be introduced in a PR where one commit is the change just in LLVM with any expected test, documentation, etc. Subsequent commits on the PR could update the usage of the API across the repo covering the sub-projects.

Today when we make changes like this we often introduce the new API separately, then go off and clean up the uses, then finally remove the old one. Being able to do all of this in one PR would simplify the workflow.

Git Linear Histories (Revisited)

Before I go run off and become a pariah, I think a lot has changed since we last had this discussion, and I’m curious if in light of moving to GitHub PRs and improvements to git tools we should have this conversation again.

Back in 2019 when we were planning the move to GitHub we had a thread on whether or not we should maintain a linear history or allow merge commits ([Github] RFC: linear history vs merge commits).

Re-reading that thread the main technical points were:

  • More reliable git bisect because there are no intermediate revisions
  • Cleaner git log output (also because there are no intermediate revisions)

Since our transition to GitHub, git has added the --first-parent flag to both git bisect and git log, and many git GUI tools have been updated to support first-parent filtering.

For those who are unaware, git’s first-parent option ignores commits that are attached to a merged in branch and not part of the branch you’re running against. This has the effect of allowing bisect and log to only operate on the merge commits or individual commits on a branch, and not walk down the trees.

I believe this feature in git addresses the technical concerns, but let me know if I’ve missed something.

LLVM Developer Policy

LLVM project policy for incoming code contributions is a preference to small isolated changes. There are advantages to small commits, like increased contextual information in blame output, easier regression tracking, etc.

If we allowed merges of related changes, we keep those benefits. We keep more robust blame information, we keep the ability to bisect main to a single commit that introduces issues. We have a single commit that can be reverted.

Altering our developer policy to allow features to be merged in would enable a review workflow where large changes could be reviewed both as their individual contributing parts and as a whole.

This would require to amend/rebase/force-push through the iteration of the reviews right? In this case how does one follow the review threads across these force-pushes when the review spans different reviewers / area of the project / revisions? GitHub UI does not have a good reputation at the moment for this (admittedly it may be improving).

Why is it an advantage? My impression is that it is a deficiency but I may be missing some arguments.

In these rare cases, I’ve been sending the two phab revision, but squashing them before push. This isn’t the common case I’ve seen for stacked revision though where each change stands in itself.

--first-parent is great, but the current status makes it so that commits in stack are approved and landed individually: they also get tested, bisected, and reverted individually. We’d lose all this.
Also the log for --first-parent because less readable by pointing to the merge commit instead of the commit that touches the file.
For example in TensorFlow right now:

$ git log -n 3 tensorflow/core/kernels/resource_variable_ops.cc 
commit 0409fd1cffa6342561c497756d276499f71a9df7
Author: Jeff <jefferycole100@gmail.com>
Date:   Thu Jul 28 00:10:38 2022

    Fixed spacing

commit 29c264087a94834da2758210a3b2bd058c825227
Author: Jeff <jefferycole100@gmail.com>
Date:   Thu Jul 28 00:00:50 2022

    Register more types for AssignVariableOp

commit 769eddaf479c8debead9a59a72617d6ed6f0fe10
Author: Jean-Baptiste Lespiau <jblespiau@google.com>
Date:   Wed Jun 1 03:13:29 2022

    Replace `tensorflow::Status::OK()` with tensorflow::OkStatus()`.
    
    PiperOrigin-RevId: 452199902

But with first-parent:

$ git log --first-parent -n 2 tensorflow/core/kernels/resource_variable_ops.cc 
commit 9c204384df41bf354e76aecfb8ddb58f637b2b5e
Merge: fd04e10f93ba 0409fd1cffa6
Author: TensorFlower Gardener <gardener@tensorflow.org>
Date:   Mon Aug 22 19:26:28 2022

    Merge pull request #56936 from jswag180:master
    
    PiperOrigin-RevId: 469253884

commit 769eddaf479c8debead9a59a72617d6ed6f0fe10
Author: Jean-Baptiste Lespiau <jblespiau@google.com>
Date:   Wed Jun 1 03:13:29 2022

    Replace `tensorflow::Status::OK()` with tensorflow::OkStatus()`.
    
    PiperOrigin-RevId: 452199902

The first two commits belonged to what would be for us a “stack” in a pull-request and we just see the merge instead.

Some annoyance of merges (that I don’t think you mentioned?) is also that we don’t have an easy versioning: the mental model isn’t as simple as counting revisions.

I agree that including several commits in one PR might make sense but I think I’m missing the motivation for merge commits/non-linear history. Does it boil down to “we can revert PR in a single commit”?

Thank you for opening this more focused discussion. For reference, there’s this related issue that was opened with the recent discussion about moving to GitHub PRs.

I believe both are important, but it may be helpful to focus on one of them first. I agree that case 1 is important, I have encountered it myself as well

How true is that, really? I know that the Changes tab allows you to select individual commits, but you can’t really review them individually, in the sense that you can’t say “Approve/Disapprove” on individual commits.

I’ve found this to be important, both in Phabricator and in the traditional Linux kernel-style email-based workflow, where it is not uncommon for people to say things like “Reviewed-by for patches 1, 3, and 6-8”.

Honestly, I consider that a good thing. Code is read more often than it is written, and ideally code is reviewed more often than it is written, so shifting some burden from the reviewers to the authors is a good thing.

That sounds like a gigantic disadvantage to me.

As stated, authors should structure their history into a “cohesive story”. A story that is broken somewhere in the middle is not what I think of when I hear “cohesive story”.

We can do merges, and we can do rebase without squash, but either way, let’s please maintain the rule that we strive for every commit being correct.

2 Likes

Or we just add subsequent commits addressing feedback. GitHub’s UI handles that mode of operating just fine.

In more than a few of my recent stacked commits I can make part of the compiler functional, but not all of it. So I add some new logic in clang Parse/Sema in one patch, the next change adds codegen support. Generally running the new source through clang with -fsyntax-only is fine after the first commit, but codegen might do something really odd. So we already have partially broken states. Today those end up on main. If we allowed multiple separate commits in a merge, they are in a separate history.

With -first-parent traversals, they also have no downside. They don’t break main.

I don’t know that I’ve seen people squash changes before pushing with Phabricator, and I feel like that defeats the point of having small changes. I definitely don’t see how this is better than a merge commit.

In terms of testing, I think it is obvious we don’t test changes individually. Our existing CI largely can’t keep up with the inflow of changes.

For bisecting and reverting, is that what we want? More often than not a failure tracked back to a change results in a revert. Most developers don’t spend time trying to figure out why someone else’s change broke them they just revert and make it that person’s problem to re-land. That situation gets worse with stacked revisions because of the frequency of stacked revisions getting landed in a block.

If I push 5 stacked commits at the same time and the 2nd causes a failure, someone has to revert 4 of my commits instead of one.

I think this is in part preference, but also merge commits can have any message you put in them. So this seems solvable.

Counting revisions in git is kinda using git in a way it wasn’t designed to be used… That said, all the revision counting works with -first-parent.

I am very very strongly opposed to allowing merge commits.

We should not change a fundamental part of our development policy because of a claimed deficiency of a review tool. I say “claimed” because a quick google search seems to indicate that github does support requiring rebases of PRs. See Configuring commit rebasing for pull requests - GitHub Docs.

One thing I want to explicitly comment on is the assumption that we should be using non-trivial PR branches. I want to explicitly state that a) this is bad practice, b) reviewers should not be approving them, and c) this is not and should not be a motivation for changing an existing developer flow. We review individual changes NOT branches. It is the authors responsibility to make sure each change stands independently.

8 Likes

Maybe we need to agree on what “broken state” means.

To me, having partially implemented features is fine as long as there’s no breakage in the check-* targets.

As a corollary, if you implement a feature in multiple steps, the full test coverage would be added last.

4 Likes

You’re right, this is definitely a deficiency. The existing UI allows you to view and walk through the history, but doesn’t have status indicators for approving or rejecting individual commits.

Wholeheartedly agree.

I think we have this in practice whether we admit it or not. It is super common for implementations of complex features to start at one layer and filter down. You can write that only test the bit that is implemented but if a user came in and tried to use it when partially implemented, things go awry very quickly.

This may be an area where my using the word “functional” is not concise enough. In most normal cases I wouldn’t expect an intermediate change to not compile, or not pass all present test cases, but it seems reasonable to me that an intermediate change could be as partially functional as some of our existing stacked Phabricator changes.

I think there are limited cases where it might be valuable to review a small change (like an API change) without the noise of all the updated call sites or cascading changes. Maybe an intermediate change builds and passes LLVM but not Clang, or Clang but not Clang-tools-extra. Maybe an intermediate change has the Clang side of a new language feature, but the libC++ side comes in a subsequent commit in the same PR.

Uh, what do you end up merging then?
I think I would have a strong concern about merging such history (with added commits like “address reviews from Chris”, “fix typo from earlier commit”, …) other than as a squash commit: this would otherwise lead to a worse git log / git blame / git bisect history. It seems to contradicts what I understood with your statement: “This puts some work on the author to be able to structure their history into a cohesive “story” that a reviewer can follow, but it does largely match current workflows”.

They don’t have this particular downside, that does not mean they don’t have others, as I see it: history is iffy, bisection is as well (without --first-parent), you can’t git blame / git log and checkout/revert this commit, …

Actually some of us have bots that don’t take multiple commits together: we build/test each commits individually. Moreover this point seems moot to me: we have the expectation that we can do so even if in practice not all the bots are able to verify it.

I don’t understand the point, but I have stacked revision that adds a feature to LLVM support, and later start using it elsewhere. It’s not clear to me why these should be bundled in a merge, or reverted together. Also I tend to land commits from my stack as soon as they are approved.

I don’t think understand your point: the principle of the stack is that you can attribute a different commit message to describe individual changes. If the merge message “solves” it just as well, then why can’t we just squash-commit the PR in the first place?

Not exactly: you don’t get a unique number for each commit in the repo. But you’re right: admittedly it may be enough as a first approximation for this (assuming no long lasting feature branch).

3 Likes

I don’t really know how to respond to this. My post makes no claim that GitHub doesn’t support requiring rebases. In fact this seems pretty orthogonal to my post.

I also don’t really know how to respond to this.

Whether we have a formalized process of allowing larger PR branches or not, people are developing on larger branches, then breaking up the changes and uploading them to Phabricator. If people weren’t doing that we wouldn’t have stacked reviews on Phabricator, and this wouldn’t be a topic that we’re discussing at all.

I really don’t want this thread to turn into a bunch of people pointing at each other saying “you’re doing it wrong”. We’ve had other discussions about workflow concerns moving to GitHub PRs.

My intent here was to propose an alteration to LLVM’s workflow to adjust to differences in how GitHub is designed to be used.

2 Likes

This is very much something worth exploring. I agree you don’t want a history that has a bunch of “addressing feedback” comments. We could have an expectation that changes be rebased after approval before merging. That puts some burden on authors to keep the history clean, but enables us to work within the limitations of the tools.

Adding a flag to git bisect seems like a solution here… There are also really simple tools that can handle the blame & log scenarios (GitHub - jianli/git-get-merge: A git command to find the merge that introduced a given commit.), so I think this is solvable with tooling.

Some bots testing some configurations getting each commit individually doesn’t mean we have per-commit test coverage. I don’t even know that it is fair to say that we have the expectation that every change individually build and test. We rely heavily on a post-commit build and test flow with an expectation of best-effort pre-commit testing by the author, which by definition means we will have broken commits in the history.

More than a few of my stacked changes have gotten approved out of order, so landing as soon as they are approved isn’t always viable.

There are probably cases where a squash makes sense too. If I have a single small isolated change, I could add additional changes addressing feedback and squash on the final merge. Maybe a one-size-fits-all requirement for PR merge policy isn’t appropriate for LLVM.

A single point to revert is one benefit, also when used with git bisect -first-parent, a merge commit is a single stop on the bisect steps.

This allows commits that need to touch multiple parts of the project to merge atomically, while having separate history.

I don’t work directly on LLVM but am certainly a downstream consumer of LLVM and have to deal with a rebranch every 6 months to a year. I think this is the third rebranch in a row where we’ve ingested a partial patch set that has resulted in either a miscompile from LLVM/Clang, or some other breakage, which is hard and time consuming to hunt down. It’s very hard to determine where a working cut point is that you can branch from with the current model since you cannot tell where one set of patches ends (at least if you’re not intimately involved with what’s going on). If everything required for a patch lands in a single merge so that the release branch/main branch is always working (not just passing), that would be immensely helpful.

It can help with reviewing wider-effecting changes. One can break the commits into the actual change, a new test (if necessary), and then the updates to the existing tests. At least I’m personally more interested in seeing what the issue was and what the fix was before digging through the rest of the test cases.

1 Like

You’re likely looking for something fairly orthogonal to this discussion: most changes aren’t part of a stack I believe, at least looking at a big downstream consumer (Google) where we rebranch twice per day it does not seem to be the source of issues. Also looking at the cherry-picking process going on for over a month during the release every 6 months can be instructive.
That said if you have data that show that a significant source of main being broken comes from a stack I’d be happy to learn about this!
There was discussion (I can’t find it right now) about bot being able to push a tag/branch in the repo to indicate their last-known-good commit in main. That would allow people like you to filter on a particular set of platforms / config to build and pass a particular test-suite before attempting to rebranch.

1 Like

That is not the impression i’ve personally got from the many many discussions about Phab vs GitHub vs others.

A few very vocal users have made it sound like stacks are critical functionality that they use all the time. If most changes are not part of a stack then i think (over in the other thread on moving to GitHub) that we should say so, and move faster to GitHub than the current timeline.

I think it is critical functionality, but in the scale of all commits in the repo I’m not sure what proportion it represents.
(also still not clear that these stack are a large source of miscompile for example)

3 Likes

I don’t want to derail this thread, or to suggest that your workflow isn’t useful, but this does make me wonder if we’re optimizing for the wrong workflow. If the majority of the commits to the repo don’t involve stacks, and @preames pointed out that:

then why not just ban stacks and non trivial branches? Or at least not make them a first class workflow. If they aren’t supported by GitHub then no problem, as most commits don’t use them anyway.

That would also mean that the main discussion in this thread is moot, as we just wouldn’t support branches anyway, so no merge commits. Sorry @beanz!

Full disclosure, i use branches all the time in my job, and we exclusively use squashed merges. Its not perfect, but its not too bad. So i’m not against branches, but i’m also not convinced that LLVM should support anything other than simple non-stacked PRs.

1 Like

I think this is likely true in the sense that they aren’t part of Phabricator stacks (although have no real data). I think there is a difference between the number of changes that are stacks in Phabricator and the number of changes that build off other recent changes (which in effect are stacks).

I find Phabricator’s tooling awkward, but usable. I wonder if people would use stacks more if it was as easy as “push a branch”.

1 Like

I don’t really know how to respond to this. My post makes no claim that GitHub doesn’t support requiring rebases. In fact this seems pretty orthogonal to my post.

Going back and reading your original post again, I agree I misread. Apologies for jumping to conclusions.

My core point about not supporting merge commits still stands though.

You’re proposing a process change to allow a workflow that the community has repeatedly and explicitly rejected. I don’t want to make it easier to review non-trivial branches; I want people to stop trying to use non-trivial branches during review.

To highlight, I use patch stacks on a somewhat regular basis, but it is not because I want review of the “branch”. (Except occasionally as a “here’s where I’m going, seem like a decent idea?”) Each change stands on it’s own, as it should. This is a very core part of LLVM developers culture, and I strongly oppose changing that.

4 Likes

I think it’s premature to allow merge commits in main, and we should stick with linear history. I think our project practices should evolve incrementally over time. The migration from Phabricator to PRs will be disruptive, and so far I’m not aware of anyone working on it yet. Let’s focus on doing that well first.

I think the problems you describe are also problems in our current Phabricator workflows. I think we should try to limit scope and focus on replicating and improving our existing contribution workflow.

I think the highest priority area for improvement is pre and post submit test result information. The Phab premerge test results are useful, but could be more useful if they were more readable. Buildbot for post submit leaves a lot to be desired. Nevermind that subprojects (libc++) are rolling their own testing infrastructure because our common infrastructure is hard to use. I think we all have the capacity to deal with LLVM development workflow changes, but it’s limited, and we should spend that capacity wisely.

I don’t have a lot of time to continue reading and engaging here, but I wanted to provide input, and I hope it helps.

8 Likes