Force push and rebase

LLVM GitHub User Guide — LLVM 18.0.0git documentation says “In general, you should avoid rebasing a Pull Request and force pushing to the branch that’s the root of the Pull Request during the review.”

I very much understand that

The fidelity of preserving inline comments after a force push has always been a weakness. The comments may be presented as “outdated”. In the past, there was a notorious “lost inline comment” problem.
Nowadays, the situation has improved, but some users still report that inline comments may occasionally become misplaced.

However, the strong words “should avoid” made me a bit concerned. I have some analysis here

In a large repository, avoiding rebases may not be realistic due to other commits frequently modifying nearby lines. Some people use a remote branch to save their work. Having to worry about whether a rebase could cause spam makes the branch more difficult to use. When working with both the latest main branch and the pull request branch, switching between branches results in numerous rebuilds. Rebasing follow-up commits could lead to merge conflicts and more pain. In addition, a popular convention among many LLVM contributors is to commit tests before landing the functional change, which also mandates force-push. Sometimes, only through rebasing can one notice that the patch needs adjustments to more code or tests due to its interaction with another landed patch:

  • Another landed patch added some tests that would be changed by the current patch.
  • Another landed patch added a new use of the function that is renamed by the current patch.

IIUC The CI does something like git fetch -v --prune -- origin refs/pull/xx/head; git checkout -f commit. It doesn’t check the test result on the latest main branch.

I’d like to hear about:

  • The fidelity of preserving inline comments and analysis of “Outdated”. How is it problematic after a rebase or a force push? IIUC, appending a new commit without a rebase/force-push makes nearly comments outdated as well.
  • Whether there are other situations that a rebase is necessary.
  • Whether a rebase followed by appending a fixup commit is recommended, or acceptable, i.e. the original commit set (A) becomes (A’ + B) where A’ is nearly identical to A but with a different base.
  • Any possibility softening the word “should”.
  • How to avoid force push in the presence of stacked patches.
  • If the foundation has a channel to talk to GitHub, help bring awareness to Improve workflow when force-pushing during code reviews · community · Discussion #3478 · GitHub

I will be out of town for about 2 weeks and will not likely follow up with this thread soon.

6 Likes

Thanks for raising this, I think it does deserve a separate thread. Just to keep things connected, I had similar concerns in the how’s it going with GitHub PRs thread. There was also an interesting question there from @mstorsjo that I don’t think was answered:

on the topic of discouraging the use of rebase or not, and whether context of inline commits is lost or not; doesn’t such context disappear after the review anyway, if the PR branch is deleted? (And deleting the branches is kinda recommended anyway, to not litter up the branch namespace in one’s fork - and GitHub provides a button in the PR to allow deleting it after it has been merged.)

1 Like

That would be something useful to improve regardless: should the CI just merge into main?

1 Like

I think that this kind of stuff depends …

For small patches (or big patches where most changes are “automatic”), the discussions tend to be short and even when you rebase it’s relatively easy to keep track of it. And that seems what GitHub is geared towards.

For bigger changes, this becomes much trickier and we may want to propose some guidelines, e.g. “Don’t rebase until all the current threads of discussions are resolved”?

In general, +1 to allow rebasing. Just my 2p.

-Andrzej

I think you could use git merge upstream/main, instead of git rebase, calling it iteratively on each commit/branch in the sequence of stacked patches to avoid needing any force pushes. You might need git rerere turned on to make this work well though?

If interested, this can be fetched from github as refs/pull/xx/merge instead of fetching from ref/pull/xx/head. This ref will not exist though if the commit cannot be merged cleanly.

1 Like

Assuming a local review process, one can use git range-diff llvm-project/main $old_commit_id $new_commit_id to see the diff-of-the-diff, which will be empty if the rebase was clean and may show changed lines (in the context) if the rebase wasn’t clean.

Not sure if this is possible on GitHub Web directly.

Something similar got discussed on HN:

It is similar in the sense that people are trying hard to cope with how comments get confusing on PRs. It’s frankly a very bad user experience. It’s telling how the article starts with:

GitHub pull requests is where I spend almost all of my time while on GitHub, and to me its also unfortunately the most frustrating part of GitHub.

3 Likes

I believe I was one of the ones who pushed against rebases unless absolutely necessary. My main issue with them is that a force push makes it really hard (impossible?) to see what changed since your last review, at least using the option in the files tab of the PR. As someone who tends to review incrementally (i.e. essentially looking only at what has changed in the PR since I last made a review comment, the same as the “DXXXXXX/new” Phabricator mode), a force rebase on a larger commit can make it really hard to spot what has actually changed and therefore actually needs looking at.

I have no experience with using merges during the PR. a) does that result in merge commits in the final thing that lands? b) does that mess up the “changes since last review” option?

If you squash and merge, the merge goes away. GitHub also has a nice diff display for merge commits in the PR commit list which captures just the changes introduced by the merge. One of the stacked PR solutions we were evaluating, ghstack, uses merge commits to good effect for this; see GitHub - ezyang/ghstack: Submit stacked diffs to GitHub on the command line for a discussion of that and Write a guide for doing "Stacked Reviews" with GitHub Pull Requests · Issue #56636 · llvm/llvm-project · GitHub for how it looks in practice.

I assume that you mean:

git pull  # main branch
git switch pr    # timestamps are updated. Most files need to be rebuilt
git merge origin/main   # resolve conflicts here

This works. However, does this make a reviewer’s life easier?

As a reviewer, I check out the branch with gh pr checkout $id. How can I see the code modified in this PR? git diff main..pr output includes all commits in the merge commit and is therefore not ideal.

I do not see how this git merge origin/main flow is better than my alternative:

The best workaround I can come up with is to: perform a rebase and force push, then make a functional change and append a new commit. The new commit represents the part that reviewers have not read.

If I invoke git log, all the updates are in the last few commits. With the git merge origin/main flow, it’s difficult to find all commits (original commit plus fixups) in this pull request.


I’d like comment on this bullet point in my original message:

The fidelity of preserving inline comments and analysis of “Outdated”. How is it problematic after a rebase or a force push? IIUC, appending a new commit without a rebase/force-push makes nearly comments outdated as well.

I feel that GitHub’s comment system is now robust enough. If we merely rebase and append a new commit, the comments are preserved with high fidelity.


Let me elaborate on git switch pr # timestamps are updated. Most files need to be rebuilt.

I often maintain multiple pending reviews. If I invoke the following commands, I need to rebuild very few files.

git rebase origin/main pr1
# edit, possibly append a new commit
git push -f --no-verify

git rebase origin/main pr2
# edit, possibly append a new commit
git push -f --no-verify

If I use the merge workflow

git switch pr1
git merge orign/main
# edit, append a new commit
git push

git switch pr2
git merge orign/main
# edit, append a new commit
git push

It’s almost assuredly CMake will rerun (since some CMake files have been modified by others) and Ninja needs to rebuild most targets.

git diff main...pr (with three dots) should work, as long as your local main branch is recent enough.

I wonder why there does not seem to be an equivalent for git merge. The closest I can think of is something like:
git switch upstream/master
git merge pr1
git checkout -B pr1
git push upstream HEAD

Although the merge commit will have the parents in the wrong order, that will eventually squash out in the final commit on github.

Unrelatedly, I thought if you used ninja then it builds from file hashes not mtime, but I have ccache set up too, so I might be confusing the two. They make a good combo.

Thanks for the tip. Two and three dots with diff — pydagogue 0.2 documentation has an explanation for the curious.

Now the pain is merely:

  • I cannot list author’s fixup commits easily (since merge commits blurred the picture)git log --first-parent lists the fixup commits (thanks to @jdenny-ornl)
  • If I want to check how a patch interacts with my patch touching the same area, it’s a bit tricky as I cannot easily cherry pick the commits.
  • Merge commits don’t work well with stacked patches

Rebasing and appending a fixup commit does not have the aforementioned downsides, with a mere GitHub UI problem that clicking the “compare” button is not super helpful, but the reviewers can check the https://github.com/llvm/llvm-project/pull/XXX/commits page instead.

git log --first-parent has helped me in the past with this part.

You can merge instead of cherry-picking (not that I am a big fan of merged-history)