"Living Downstream Without Drowning" BOF @ Dev Meeting

Mike Edwards and I will be hosting a talk/BOF called "Living Downstream
Without Drowning" which is for anyone maintaining a bunch of local changes
to Clang/LLVM/etc. We will present some procedures and tactics we've evolved
at Sony, including patch tactics for reducing merge pain, and how we are
throwing automation at the problem.

But we are really curious what YOU have done and what we can learn from each
other, so we can all manage to keep from drowning under the flood of changes
from upstream!
--paulr

I find the git imerge script extremely useful for this kind of situation.

https://github.com/mhagger/git-imerge

Logically, it does something similar to rebasing your local branch onto EVERY commit in the upstream branch, in turn, until it finds conflicts. There is cleverness to make this efficient, let you stop and restart the merge, share intermediate state with others, build&test intermediate results (automatically if you want). At the end of the process you can choose to keep the intermediate commits that are the same as the result of a “git merge” or a “git rebase” or a new feature the author calls “rebase with history” that keeps the original branch too and makes each rebased commit have the corresponding commit in the original branch as a parent.

Thank you very much for the recommendation! The down side of git-imerge is that it is O(NM), where N is the number of your commits since the last merge and M is the number of upstream commits. I just finished a merge of about 6 months with 4000 upstream commits and a bit over 100 local ones - it took about a week of CPU time. The huge up side is that it took a lot less human time than previous merges.

The only irritation is that LLVM has a strong policy of reverting stuff that’s broken. The Tuples work was committed and reverted three times in quick succession. Each time introduced merge conflicts, though by the third time I’d had to fix them, I was getting pretty good at it. I don’t wish to change the policy with regard to reverts (having a working and stable head is very valuable), but it would be good if we could have a stronger policy that stuff that is reverted goes into Phabricator and is not recommitted until the person who initially reverted it has signed off (or, at least, there’s a strong consensus that it’s the right thing).

The main lesson for me is that more frequent merges will be less painful. That’s a big change, because previously I’d spend a day or two fixing merge conflicts no matter how frequently or infrequently I merged, so the incentive was to merge every 3-6 months.

David

I'm not sure we have that as a policy, but I assume this is the
consensus. Though, sometimes, it happens. I'd say, if that was the
only big problem you had in 4000 commits, means even with the revert
policy, the tree is pretty stable. :slight_smile:

--renato

It really was surprisingly painless, especially given that the MIPS back end merged a load of my patches with tweaks and bug fixes in between the two merges. Finding the cause of each conflict and throwing away my local version when there was an improved version upstream was made very easy.

We really should document git-imerge somewhere public for the many people that end up maintaining their own downstream forks of LLVM.

David

It seems many causes of the commit->revert->commit->revert->... cycle
are unexpected buildbot failures. The ability to submit changesets
from phabricator to test by the buildbots (which as I understand it,
isn't currently available) would reduce the frequency of this
ping-ponging. That said, it happens fairly rarely and the current
approach seems to work just fine, other than the minor annoyance David
notes.

Alex

Thank you very much for the recommendation! The down side of git-imerge
is that it is O(NM), where N is the number of your commits since the last
merge and M is the number of upstream commits.

Often it should be a lot less than O(NM), due to the bisection strategy,
but if there are a lot of merge conflicts then it can be more.

  I just finished a merge of about 6 months with 4000 upstream commits and
a bit over 100 local ones - it took about a week of CPU time. The huge up
side is that it took a lot less human time than previous merges

I presume you did a build and ran tests at each node? Simple git merges
won't take long at all.

Thank you very much for the recommendation! The down side of git-imerge is that it is O(NM), where N is the number of your commits since the last merge and M is the number of upstream commits.

Often it should be a lot less than O(NM), due to the bisection strategy, but if there are a lot of merge conflicts then it can be more.

The ‘autofilling’ was what took most time, and it autofills the merge of every pair of (upstream, local) commits.

  I just finished a merge of about 6 months with 4000 upstream commits and a bit over 100 local ones - it took about a week of CPU time. The huge up side is that it took a lot less human time than previous merges

I presume you did a build and ran tests at each node? Simple git merges won't take long at all.

Nope. Unfortunately, you can’t git-imerge clang and LLVM simultaneously, and the lack of API stability in LLVM meant that there was no way of building my tree until I’d finished merging both. I’m not sure if this is something that can be fixed by using git submodules for the tools in the svn exports in some automatic way.

David

>
>> Thank you very much for the recommendation! The down side of
git-imerge is that it is O(NM), where N is the number of your commits since
the last merge and M is the number of upstream commits.
>
> Often it should be a lot less than O(NM), due to the bisection strategy,
but if there are a lot of merge conflicts then it can be more.

The ‘autofilling’ was what took most time, and it autofills the merge of
every pair of (upstream, local) commits.

>> I just finished a merge of about 6 months with 4000 upstream commits
and a bit over 100 local ones - it took about a week of CPU time. The huge
up side is that it took a lot less human time than previous merges
>
> I presume you did a build and ran tests at each node? Simple git merges
won't take long at all.

Nope. Unfortunately, you can’t git-imerge clang and LLVM simultaneously,
and the lack of API stability in LLVM meant that there was no way of
building my tree until I’d finished merging both. I’m not sure if this is
something that can be fixed by using git submodules for the tools in the
svn exports in some automatic way.

I believe Takumi deals with this by having a single llvm-project git mirror
and using the "out of tree" build support so he can build directly in that
tree without having to move/checkout clang into tools/clang, etc, etc. & he
uses this for lock-step bisecting. Of course doesn't work perfectly because
most of the rest of us don't use atomic commits to make those API breaking
changes because it isn't better/more easily supported...

From: cfe-dev [mailto:cfe-dev-bounces@lists.llvm.org] On Behalf Of Alex
Bradbury via cfe-dev
Sent: 05 November 2015 12:14
To: David Chisnall
Cc: llvm-dev@lists.llvm.org; Bruce Hoult; cfe-dev@lists.llvm.org
Subject: Re: [cfe-dev] [llvm-dev] "Living Downstream Without Drowning"
BOF @ Dev Meeting

>>
>> I find the git imerge script extremely useful for this kind of situation.
>>
>> https://github.com/mhagger/git-imerge
>>
>> Logically, it does something similar to rebasing your local branch onto
EVERY commit in the upstream branch, in turn, until it finds conflicts. There is
cleverness to make this efficient, let you stop and restart the merge, share
intermediate state with others, build&test intermediate results
(automatically if you want). At the end of the process you can choose to
keep the intermediate commits that are the same as the result of a "git
merge" or a "git rebase" or a new feature the author calls "rebase with
history" that keeps the original branch too and makes each rebased commit
have the corresponding commit in the original branch as a parent.
>
> Thank you very much for the recommendation! The down side of git-
imerge is that it is O(NM), where N is the number of your commits since the
last merge and M is the number of upstream commits. I just finished a
merge of about 6 months with 4000 upstream commits and a bit over 100
local ones - it took about a week of CPU time. The huge up side is that it took
a lot less human time than previous merges.
>
> The only irritation is that LLVM has a strong policy of reverting stuff that’s
broken. The Tuples work was committed and reverted three times in quick
succession. Each time introduced merge conflicts, though by the third time

Sorry for the noise.

I’d had to fix them, I was getting pretty good at it. I don’t wish to change the
policy with regard to reverts (having a working and stable head is very
valuable), but it would be good if we could have a stronger policy that stuff
that is reverted goes into Phabricator and is not recommitted until the
person who initially reverted it has signed off (or, at least, there’s a strong
consensus that it’s the right thing).

That's the policy the tuples work has followed. IIRC, the first couple reverts were responses to lldb and BPF buildbot failures. The lldb failure was fixed by someone else by the time I'd committed the revert so I immediately re-committed. After that, the BPF failure appeared which turned out to be an easy fix so that was another cycle. The last revert was in response to objections from Eric. At this point, the work remains out of tree since we're still resolving those comments.

The MCTargetMachine work that replaces the Tuples work is likely to be noisier in some ways. It's not the simple mechanical change it used to be any more so apologies in advance for the merge pain it may cause. The good news is that the increments are generally smaller.

If you are using git: I had good experiences with "git config rerere.enabled true" which builds a database on how conflicts are resolved and will look those up for you when you encounter the same conflict a 2nd time.

- Matthias

According to the author of git imerge, rerere interacts badly with it, so he temporarily turns it off.

Half the point of git imerge is that you don’t *have * to solve the same conflicts repeatedly. And you don’t. Unless that commit has been backed out and reapplied multiple times. Which sould be very much the exception.

The other half of the point of git imerge is that you solve the minimum possible conflict, that of exactly one trunk commit against one branch commit. This is in general vastly easier than trying to decipher how your branch commit conflicts with the entire history of changes to master.

Thanks to everyone who attended our talk, and especially the dozen or more
people who talked to me afterward at the reception and all the next day!
PDF of the slides were contributed in r252262.

It's clear there's a pent-up demand for continuing the conversations on
all the related topics, and there will be different solutions that are
appropriate to different situations. What would be a good way to keep
collaborating properly about all this?

For starters I'm about to head off to the Bay Area Social...
--paulr