How to keep out-of-tree target in sync with upstream?

s-barannikov · January 28, 2023, 10:55pm

I basically see three options:

Rebase the fork onto upstream/main. Resolve explicit conflicts during the rebase process. Likely makes it impossible to bisect fork’s commits. Hopefully, the fork’s workflow didn’t allow merge commits.
Cherry-pick commits from upstream/main on top of the fork. Resolve explicit conflicts in the process. Clobbers upstream’s commit hashes and, in case of conflicts, contents.
Merge upstrem/main into the fork. Resolve explicit conflicts at once. Conflict resolution becomes a very difficult task.

Are there better options? Can you share your experience with keeping your fork up to date?

arichardson · January 28, 2023, 11:14pm

I think what’s best depends a lot on how diverged your fork is. For smaller divergence, rebasing might be the simplest solution - especially if you’re happy with your users having to switch branches or deal with force pushes. If not then I’d suggest merging from upstream.

What we have been doing for CHERI (highly diverged and with many years of history) is merging one commit at a time from upstream. This means you can easily bisect (with some caveats - not every commit is guaranteed to build due to local API changes) and there is a mostly linear history. If you decide to use this strategy I’d recommend @jrtc27’s mergify-rebase script: GitHub - CTSRD-CHERI/git-mergify-rebase: Merge git changes one commit at a time.

In the past we use a single merge commit (or git-imerge), but that made it incredibly difficult to to find out which commit is responsible for breakage. It also means you end up with one huge conflict without being able to tell what changed.

Merging one commit at a time means you can easily deal with formatting only/mechanical updates/etc. merge conflicts. The downside is that you end up redoing lots of conflicts due to reverts (rerere helps there but can also sometimes cause problems). Last time I did this update I also ran the testsuite after every merge conflict and fixed failures as soon as they came up. This is rather helpful if you want to bisect.

mehdi_amini · January 29, 2023, 12:43am

That’s my go-to solution, but merging continuously shouldn’t yield difficult conflict resolution: on the opposite this is likely the solution that would bring the least amount of conflicts! (@arichardson explained it pretty well I think)

Something I’d add to the merging strategy would be the way to organize your patch: in the absolute there can’t be merged conflicts if there is no change to a file upstream. So best is to be the least intrusive as possible when you have to patch a file, and even better is to work with upstream to develop pluggable/injectable APIs so that you can inject code “from the outside” without patching/diverging.

I have some memories of a discussion about best practices to maintain patches out-of-tree, maybe even a talk from @pogo59 ?

rengolin · January 29, 2023, 1:32pm

Strongly agree.

How often you rebase will depend on how large is your delta and how intrusive are your changes:

I’ve worked with projects that were mostly users of upstream stuff, so every few months was enough.
Now we’re working and contributing constantly, so every time we land a new patch or we need a new feature that we were reviewing, we pull (could be multiple times a week, or weeks without a pull).
I’ve also worked directly upstream, so all my worktrees were HEAD plus my patches to upstream.
Some larger projects pull once every release, because of complex internal validation systems and the downtime of a complex merge twice a year becomes justifiable.

You’ll have to weigh in the pros and cons and find your own solution. Initially it may look unwieldy, but it should converge soon enough onto something that is specific to your project.

pogo59 · January 30, 2023, 2:29pm

All the things mentioned by previous comments on this thread sound very familiar!

Indeed; this was a joint presentation/BoF by myself and @sqlbyme at the Fall 2015 Dev Meeting, see slides and video.

What we do at Sony is merge upstream/main into our fork, one commit at a time. Conflict resolution is pretty easy most of the time. Build failures are also generally easy. Test failures sometimes require more time, in which case we usually XFAIL the problematic test to keep the merge going, and do the real fixing later. Because we merge one commit at a time, bisection works very smoothly.

We also make some effort to write our local changes in ways that will tend not to cause unnecessary merge conflicts. Some of those tips are in the slides/video.

I’m not familiar with @jrtc27’s script, but I’m glad there is something in open-source. We wrote our own scripts, driven by Jenkins, which tries to stay fully up-to-date. This means someone needs to be staying on top of the failures If you go this route, how often you get failures will depend on the nature and size of your local changes. At the high end, while doing research for this talk, Apple people estimated it was roughly one half-time engineer to keep up. I don’t have hard data on Sony’s cost but it’s noticeably less than that. I also talked to someone who maintained some relatively small patches, who did a manual merge (or possibly a rebase) once a week, and IIRC this typically took an hour or two.

Topic		Replies	Views
"Living Downstream Without Drowning" BOF @ Dev Meeting LLVM Dev List Archives	12	93	November 6, 2015
Best practices for rebasing nascent backend? LLVM Dev List Archives	3	108	March 15, 2021
Need advice on migrating from GitHub/llvm-mirror LLVM Dev List Archives	2	93	February 5, 2020
RFC: Dealing with out of tree changes and the LLVM git monorepo LLVM Dev List Archives	55	124	December 11, 2018
How to deal with accidental directory tree deletes, downstream? LLVM Dev List Archives	7	86	April 18, 2019

How to keep out-of-tree target in sync with upstream?

Related topics