RFC: Adding a staging branch (temporarily) to facilitate upstreaming

To facilitate collaboration on an upstreaming effort (see “More context” below), we’d like to push a branch (with history) called “staging/apple” to github.com/llvm/llvm-project to serve as an official contribution to the LLVM project. This enables motivated parties to work with us to craft incremental patches for review on Phabricator. This branch would live during the effort and then be deleted after. It would not be merged.

Does this seem fine? If you have a strong objection, please share your concern.

For reference, I ran some experiments:

  • A --bare clone (just the Git database) I have of github.com/llvm/llvm-project was around ~1GB. Fetching this branch from github.com/apple/llvm-project increased it to ~1.2GB. Running git gc --aggressive brought it down to ~850MB.
  • The worktree of the “master” branch is ~1GB. Adding the Git database gives ~2GB, ~2.2GB, and ~1.9GB.
  • The diff of the proposed staging/apple branch is 3.1MB at -U0, 4.1MB at -U3, and 32MB at -U999999 (Phabricator settings).

More context

We’re making a major push over the next few months to upstream changes that have accumulated over time in the branch called “apple/master” at github.com/apple/llvm-project. It has always been a non-goal for us to have changes, but over the years we’ve accumulated a non-trivial diff vs. github.com/llvm/llvm-project. This includes (but is not limited to) tweaks/features related to tuple hashing, modules hashing, source attributes, API notes, pointer authentication, indexing-while-building, and local refactoring.

Our goal is to eliminate this difference. Besides paying off some debt, this upstreaming effort unblocks the Swift compiler (github.com/apple/swift) from building directly against an upstream checkout of LLVM. That’s why non-Apple contributors are motivated to help craft incremental patches.

Alternatives considered

As an alternative, we could post a GitHub pull request and close it without merging. From our perspective this would serve the same purpose. However, pull requests are contentious in LLVM.

Another alternative is to post a bulk Phabricator review and then “abandon” it. However, this has the disadvantage of not contributing the history (~30k commits).

Hey Duncan,

Hey Duncan,

To facilitate collaboration on an upstreaming effort (see “More context” below), we’d like to push a branch (with history) called “staging/apple” to github.com/llvm/llvm-project to serve as an official contribution to the LLVM project. This enables motivated parties to work with us to craft incremental patches for review on Phabricator. This branch would live during the effort and then be deleted after. It would not be merged.

Does this seem fine? If you have a strong objection, please share your concern.

For reference, I ran some experiments:

  • A --bare clone (just the Git database) I have of github.com/llvm/llvm-project was around ~1GB. Fetching this branch from github.com/apple/llvm-project increased it to ~1.2GB. Running git gc --aggressive brought it down to ~850MB.
  • The worktree of the “master” branch is ~1GB. Adding the Git database gives ~2GB, ~2.2GB, and ~1.9GB.
  • The diff of the proposed staging/apple branch is 3.1MB at -U0, 4.1MB at -U3, and 32MB at -U999999 (Phabricator settings).

More context

We’re making a major push over the next few months to upstream changes that have accumulated over time in the branch called “apple/master” at github.com/apple/llvm-project. It has always been a non-goal for us to have changes, but over the years we’ve accumulated a non-trivial diff vs. github.com/llvm/llvm-project. This includes (but is not limited to) tweaks/features related to tuple hashing, modules hashing, source attributes, API notes, pointer authentication, indexing-while-building, and local refactoring.

Our goal is to eliminate this difference. Besides paying off some debt, this upstreaming effort unblocks the Swift compiler (github.com/apple/swift) from building directly against an upstream checkout of LLVM. That’s why non-Apple contributors are motivated to help craft incremental patches.

Alternatives considered

As an alternative, we could post a GitHub pull request and close it without merging. From our perspective this would serve the same purpose. However, pull requests are contentious in LLVM.

Another alternative is to post a bulk Phabricator review and then “abandon” it. However, this has the disadvantage of not contributing the history (~30k commits).

Seeing the alternatives, if a closed pull-request would fit your use-case, then I’m not sure why you need the branch to actually live in the monorepo instead of in any fork (github.com/apple/llvm-project or another)? It seems publicly accessible the same way?

As I understand it, a key need is to explicitly contribute this to the LLVM project to make it unambiguous that it has been contributed and is completely available for folks not at Apple to iterate on the code and turn it into code-reviewable chunks.

So whatever happens needs to be quite explicit in its nature as a contribution. IMO, a branch of the repository definitely qualifies.

IMO, a pull request isn’t as clear given that they haven’t been used for contributions before. This is not a time to be innovative IMO. A branch as a staging location has been used many times over the history of the project though and seems nicely unambiguous in that regard.

Ah, I didn’t understand this from the original description, makes sense now! Thanks.

FWIW a branch seems appropriate to me.

If this is just a legal matter, can’t it be solved as such? I would expect just releasing a statement that you contribute the diff between apple/llvm-project@XXX and llvm/llvm-project@YYY under the LLVM license should be sufficient.

I’m really not seeing a technical reason to include this branch in the monorepo, and make it larger for everyone. This seems like exactly the case where the GitHub fork model works well.

Regard,
Nikita

There are potentially many ways to accomplish what you describe. However, the options Duncan listed in his original email are genuinely the easiest ones any of those involved have come up with and the next best ideas are either substantially less clear or substantially more work IMO.

Duncan also gave some evidence that the size increase is quite small. Again, just IMO.

Hey Duncan,

To facilitate collaboration on an upstreaming effort (see "More context" below), we'd like to push a branch (with history) called "staging/apple" to github.com/llvm/llvm-project to serve as an official contribution to the LLVM project. This enables motivated parties to work with us to craft incremental patches for review on Phabricator. This branch would live during the effort and then be deleted after. It would not be merged.

Does this seem fine? If you have a strong objection, please share your concern.

For reference, I ran some experiments:

A `--bare` clone (just the Git database) I have of github.com/llvm/llvm-project was around ~1GB. Fetching this branch from github.com/apple/llvm-project increased it to ~1.2GB. Running `git gc --aggressive` brought it down to ~850MB.
The worktree of the "master" branch is ~1GB. Adding the Git database gives ~2GB, ~2.2GB, and ~1.9GB.
The diff of the proposed staging/apple branch is 3.1MB at `-U0`, 4.1MB at `-U3`, and 32MB at `-U999999` (Phabricator settings).

More context

We're making a major push over the next few months to upstream changes that have accumulated over time in the branch called "apple/master" at github.com/apple/llvm-project. It has always been a non-goal for us to have changes, but over the years we've accumulated a non-trivial diff vs. github.com/llvm/llvm-project. This includes (but is not limited to) tweaks/features related to tuple hashing, modules hashing, source attributes, API notes, pointer authentication, indexing-while-building, and local refactoring.

Our goal is to eliminate this difference. Besides paying off some debt, this upstreaming effort unblocks the Swift compiler (github.com/apple/swift) from building directly against an upstream checkout of LLVM. That's why non-Apple contributors are motivated to help craft incremental patches.

Alternatives considered

As an alternative, we could post a GitHub pull request and close it without merging. From our perspective this would serve the same purpose. However, pull requests are contentious in LLVM.

Another alternative is to post a bulk Phabricator review and then "abandon" it. However, this has the disadvantage of not contributing the history (~30k commits).

Seeing the alternatives, if a closed pull-request would fit your use-case, then I'm not sure why you need the branch to actually live in the monorepo instead of in any fork (github.com/apple/llvm-project or another)? It seems publicly accessible the same way?

As I understand it, a key need is to explicitly contribute this to the LLVM project to make it unambiguous that it has been contributed and is completely available for folks not at Apple to iterate on the code and turn it into code-reviewable chunks.

How do you envision this working? Has anybody outside of Apple
_actually_ expressed interest in working on it in this way?

I'm mostly asking because we also have a fork of LLVM that we
continuously keep aligned with github.com/llvm/llvm-project/master
with the intention of deltas being broken out to be moved back
upstream. That branch is currently at
https://github.com/GPUOpen-Drivers/llvm-project, but if the LLVM
community decides that branches whose goal is to be contributed
upstream can (and should?) live in github.com/llvm/llvm-project, we'd
probably be interested in doing that and having a branch such as
amd/gfx-dev in the github.com/llvm/llvm-project repository.

Cheers,
Nicolai

Hey Duncan,

To facilitate collaboration on an upstreaming effort (see “More context” below), we’d like to push a branch (with history) called “staging/apple” to github.com/llvm/llvm-project to serve as an official contribution to the LLVM project. This enables motivated parties to work with us to craft incremental patches for review on Phabricator. This branch would live during the effort and then be deleted after. It would not be merged.

Does this seem fine? If you have a strong objection, please share your concern.

For reference, I ran some experiments:

A --bare clone (just the Git database) I have of github.com/llvm/llvm-project was around ~1GB. Fetching this branch from github.com/apple/llvm-project increased it to ~1.2GB. Running git gc --aggressive brought it down to ~850MB.
The worktree of the “master” branch is ~1GB. Adding the Git database gives ~2GB, ~2.2GB, and ~1.9GB.
The diff of the proposed staging/apple branch is 3.1MB at -U0, 4.1MB at -U3, and 32MB at -U999999 (Phabricator settings).

More context

We’re making a major push over the next few months to upstream changes that have accumulated over time in the branch called “apple/master” at github.com/apple/llvm-project. It has always been a non-goal for us to have changes, but over the years we’ve accumulated a non-trivial diff vs. github.com/llvm/llvm-project. This includes (but is not limited to) tweaks/features related to tuple hashing, modules hashing, source attributes, API notes, pointer authentication, indexing-while-building, and local refactoring.

Our goal is to eliminate this difference. Besides paying off some debt, this upstreaming effort unblocks the Swift compiler (github.com/apple/swift) from building directly against an upstream checkout of LLVM. That’s why non-Apple contributors are motivated to help craft incremental patches.

Alternatives considered

As an alternative, we could post a GitHub pull request and close it without merging. From our perspective this would serve the same purpose. However, pull requests are contentious in LLVM.

Another alternative is to post a bulk Phabricator review and then “abandon” it. However, this has the disadvantage of not contributing the history (~30k commits).

Seeing the alternatives, if a closed pull-request would fit your use-case, then I’m not sure why you need the branch to actually live in the monorepo instead of in any fork (github.com/apple/llvm-project or another)? It seems publicly accessible the same way?

As I understand it, a key need is to explicitly contribute this to the LLVM project to make it unambiguous that it has been contributed and is completely available for folks not at Apple to iterate on the code and turn it into code-reviewable chunks.

How do you envision this working? Has anybody outside of Apple
actually expressed interest in working on it in this way?

Yes, as far as I know, folks approached those at Apple to figure out how to collaborate on upstreaming things.

FWIW, I don’t think having this branch in the upstream LLVM repository would make much sense without that – the only purpose it solves is to allow a group of interested people across the LLVM community to collaborate effectively.

I’m mostly asking because we also have a fork of LLVM that we
continuously keep aligned with github.com/llvm/llvm-project/master
with the intention of deltas being broken out to be moved back
upstream. That branch is currently at
https://github.com/GPUOpen-Drivers/llvm-project, but if the LLVM
community decides that branches whose goal is to be contributed
upstream can (and should?) live in github.com/llvm/llvm-project, we’d
probably be interested in doing that and having a branch such as
amd/gfx-dev in the github.com/llvm/llvm-project repository.

See above – I personally think this is an effective strategy when it solves a practical problem for people in the LLVM community at different organizations who want to collaborate on upstreaming something. I’d hope that anything like that was only done contingent on genuine interest in the community and with a pretty clear “it goes away” plan when merging stuff is complete.

I want to specifically emphasise the aspect of collaborating on upstreaming. To collaborate, we need the code to be contributed to the LLVM project first, even if it is not yet accepted into the master branch. If no collaboration is necessary (that is, if the organization that originally wrote the patches wants to upstream them) then there is no need to contribute the code to LLVM project before commiting the code to the master branch. The organization can work on the patch in their own repository to prepare for upstreaming it .

Dmitri

I don’t have a opinion on this either way, but can git/GitHub maintain forks within the same organization? You could have llvm/llvm-project and llvm/llvm-project-apple-staging or something like that?

-Chris

If you click the ‘fork’ button on github more than once, it ends up giving your new fork a different name (llvm-project, llvm-project-1, etc). On the settings page of your fork, there is a way to rename the repository to the name of your choice.

I don’t think GitHub allows you fork your own repo so I think it would be disconnected from a GitHub point of view. That has a few downsides, although I’m not sure how important they are.

Regardless, if a separate repo is preferred, then a better name from our perspective would be “llvm-project-staging” (dropping the “-apple” suffix). We could push a “staging/apple” branch there.

Duncan

Ok, I’m not very concerned either way, it was just a thought. I’m very happy to see this upstreaming work happen, thanks!

-Chris

One of the advantages of the Apache license LLVM has now moved to is
that it provides a clarity on issues such as what represents a
"Contribution" (defined in section 1
https://www.apache.org/licenses/LICENSE-2.0). I take your point that
being innovative is undesirable and see the advantage of following a
well-trodden path, but I do think it might be worth looking to take
advantage of these clearer definitions in the future (even if not for
this instance), getting confirmation from lawyers as required.

Best,

Alex

Don’t want to digress too much, but we really were looking at these exact points. They are, indeed, much more clear than they have been historically. But all of my comments are genuinely in the presence of the new license structure. Sadly, there remains some ambiguity. Pushing to a branch though (or another repository) seem like fantastically clear and unambiguous approaches. =]

Right. What I'm saying is that this would solve practical problems for
us as well, where "us" is AMD plus folks we collaborate with (which
includes Google).

I really like Chris' idea: set up a
github.com/llvm/llvm-project-staging fork where groups that are in a
situation similar to Apple's or ours can put branches that are clearly
intended to have contents to be merged upstream, but may deviate while
experimental features are developed. This keeps
github.com/llvm/llvm-project clean, but provides a place where
"messier" things can happen on a neutral collaborative playing field.

FWIW, I agree that it would be nice for deviated branches to not be
permanent, but in my experience that's not really feasible with LLVM's
development flow at the moment. Even if all changes go upstream, by
the time they do, new changes are bound to have come in. This is
largely a function of the dysfunctions in our review process, but not
entirely, and in any case that review process seems like a much harder
problem to fix.

Cheers,
Nicolai

I have a mild preference for the separate llvm-project-staging approach, but am not opposed to an in tree branch either. The main argument I see for the separate repo is that the bar can be lower because the consequences for being "wrong" about the code being fully merged quickly are lower.

Or another thought, maybe we should even use the incubator flow here? Nothing says an incubator has to be long lived. If we spun up an "incubator-staging-apple" repo, wouldn't that demonstrate the same benefits?

Philip

The consequences of pushing a new branch name and then deleting it later should be low. And if we really wanted to exclude it from the default pull set, it can be placed in, “refs/staging/apple” instead of “refs/heads/staging/apple”. Nothing outside refs/heads/ and refs/tags/ gets pulled by default. This should be the obvious answer. However, we’ve had cases in the past that when people have accidentally created a branch or opened a pull request (which creates a new branch in “refs/pull/NNN”), which triggered phabricator to start sending a bunch of effectively-spam emails. I believe that may still be a problem.

Unless someone can confirm that this won’t happen, it’s probably not a good idea to push 30,000 “new” old commits into any branch in the repo.

I’d also concur with the other comments that it really feels quite silly to do have to do anything technical here at all, versus posting an email to llvm-dev along the lines of “Apple is contributing the code added in github.com/apple/llvm-project, commit hash a1fde6dadf210c937c88509ab775610213e2cfc5, and all prior commits it depends on, under the Apache2 license with LLVM project exceptions, as listed in https://github.com/apple/llvm-project/blob/a1fde6dadf210c937c88509ab775610213e2cfc5/LICENSE.txt”. That seems like it’d be sufficient – and even more explicit as a statement of intent than creating a branch! (But, of course, IANAL).

Or – how about both sending such a message and creating a phab review for git diff origin/master...swift/apple/master (three dots diff gets the diff only from the last merge-point). Maybe that covers all the bases sufficiently?

I am not convinced the “incubator” proposal is suited for this purpose as this is not about a new project but about a patch on top of LLVM itself. My understanding of the incubator is to build new project/subprojects, but not to diverge from LLVM itself (if it isn’t clear in the current proposal we should discuss it and clarify it, either way we end-up).

IMO, a pull request isn’t as clear given that they haven’t been used for contributions before. This is not a time to be innovative IMO. A branch as a staging location has been used many times over the history of the project though and seems nicely unambiguous in that regard.

I don’t have a opinion on this either way, but can git/GitHub maintain forks within the same organization? You could have llvm/llvm-project and llvm/llvm-project-apple-staging or something like that?

I don’t think GitHub allows you fork your own repo so I think it would be disconnected from a GitHub point of view. That has a few downsides, although I’m not sure how important they are.

Regardless, if a separate repo is preferred, then a better name from our perspective would be “llvm-project-staging” (dropping the “-apple” suffix). We could push a “staging/apple” branch there.

Ok, I’m not very concerned either way, it was just a thought. I’m very happy to see this upstreaming work happen, thanks!

-Chris

I have a mild preference for the separate llvm-project-staging approach, but am not opposed to an in tree branch either. The main argument I see for the separate repo is that the bar can be lower because the consequences for being “wrong” about the code being fully merged quickly are lower.

Or another thought, maybe we should even use the incubator flow here? Nothing says an incubator has to be long lived. If we spun up an “incubator-staging-apple” repo, wouldn’t that demonstrate the same benefits?

I am not convinced the “incubator” proposal is suited for this purpose as this is not about a new project but about a patch on top of LLVM itself. My understanding of the incubator is to build new project/subprojects, but not to diverge from LLVM itself (if it isn’t clear in the current proposal we should discuss it and clarify it, either way we end-up).

The “incubator” does not need to be (and shouldn’t be) the only way to create a new repo in the LLVM org. If there is a pragmatic need for a new utilitarian repo that fits outside of that process, then that seems like something that the community can decide to do at any time without further consequence.