Enable Contributions Through Pull-request For LLVM

I agree that pull requests add additional overhead, but they can be scripted to reduce it. My primary goal is to achieve these things set out in the developer policy:
http://llvm.org/docs/DeveloperPolicy.html#incremental-development

By making each of these steps explicit PR’s, they get individual testing (when we have integrated presubmit testing, which we’ll hopefully get to), code reviewers are reviewing each atomic step (and not the result of a bunch of walk), etc.

In my opinion, bundling up many small patches into a single large PR subverts a lot of the goals of incremental development. It is better to propose the small patches as individual patches.

-Chris

Right that makes sense: it think for such purpose I wouldn’t use a personal fork of the canonical repo, I would likely just create a new repo for the purpose of this project (GitHub would not track this as a « fork » directly, but that shouldn’t matter that much for developing such a project would it?).

Actually using llvm-commits@ or Phab does not make it easier does it?

Best,

Huge +1 from my side.

There are couple of small patches I want to submit, but the fact that I need to deal with phabricator just stops me from doing this.

Looking at the other comments against this step I'm wondering if the goal of the project is to keep existing (dozens?) contributors happy, or to embrace more (hundreds?) people to join the project?

Understanding why people get frustrated with Phabricator will be an
important topic. For arcanist users, creating a Phabricator
differential is as easy as one command invocation: `arc diff 'HEAD^'`.
One obstacle is that people likely have a Github account but they just
don't want to learn a new system.

The lack of a Herald replacement may be quite serious. There is no
good way to subscribe certain topics based on file paths or commit
descriptions. CODEOWNERS exists but sometimes people just want to
serve as a gatekeeper of some component but don't feel confident
enough to act as a code owner. Moreover, making a commit modifying a
CODEOWNERS file to express subscription does not sound like a good
idea.

I don't intend to weigh in on either side, but just give a perspective on a few questions asked.

From an outsiders perspective, that list isn't what I'd typically describe as the workflow, since it makes "fork" and "branch" sound like difficult operations. This sounds akin to thinking that someone would reclone the svn repo before working on a new commit—possible, but rather extreme. The distributed nature of git means that the fork is pretty simple for github (essentially creates a new branch in their database, but that is just defining a new name for an existing commit).

FWIW, I've not found it to be so straightforward in practice. If you have multiple LLVM forks, representing different distinct projects, because GitHub only lets you have one fork per account, the process via which you can work with multiple derived projects is actually quite annoying.

I didn't quite get why you need multiple forks for a given canonical repository? Maybe you're trying to setup a non-LLVM sub-project being developed in a fork of the LLVM repo?

Yes. If I have one project on optimizations of accelerator constructs of LLVM, and I have another project on adding quantum-computing extensions to LLVM, and a third project on an LLVM-based high-level-synthesis tool, then I want three separate forks. When someone getting these repositories runs 'git branch -a', they should only see branches related to the project on which they're working, and each repository should contain only data files and history related to that project (which might be large). In addition, each of these projects needs to have its own wiki pages, track its own bugs, etc. To put it another way, when someone grabs some fork on LLVM on which I'm working, it's not "Hal's LLVM that happens to have a branch that does X", but rather, "A named fork of LLVM for a particular project". Moreover, any of these projects might have changes that we want to pull into a branch for upstream contribution.

Right that makes sense: it think for such purpose I wouldn’t use a personal fork of the canonical repo, I would likely just create a new repo for the purpose of this project (GitHub would not track this as a « fork » directly, but that shouldn’t matter that much for developing such a project would it?

Yes, but then I can't create pull requests from them, right? That's the unfortunate part.

).

Actually using llvm-commits@ or Phab does not make it easier does it?

When uploading patches to Phab, it doesn't matter from where it comes.

-Hal

Best,

Mehdi AMINI via llvm-dev <llvm-dev@lists.llvm.org> writes:

Hi all,

Now that we're on GitHub, we can discuss about pull-requests.
I'd like to propose to enable pull-request on GitHub, as a first step as an
experimental channel alongside the existing methods for contributing to
LLVM.

+1. I would not want to see a dual PR/Phab situation last for long,
though. Looking in two places is a lot more work.

The main reasons I see for moving to GitHub PRs:

- Lowers the barrier for new contributors

- Is a de-facto industry standard; lots of tools know how to work with
  GitHub PRs. almost none with Phab

- Less server maintenance by the project

I am sympathetic to Roman's concerns. I haven't used GitHub extensively
myself. On the couple of projects I've contributed to I see e-mail from
GitHub issues but I can't recall seeing one from pull requests (though
the projects are so small maybe the PRs don't have any comments). It
would be great if GitHub could send e-mails on PR comments and process
replies as new comments on the PR. Maybe this is something the GitHub
developers could look into.

As for export controls, firewalling, etc. that's way beyond my
understanding but it would seem to come with any move from project-run
servers to servers we don't directly control. I would think just using
GitHub to host the sources already gets us into those waters.

                         -David

Very strong -1 at the moment. I think we need to let some of the other efforts (e.g. issues vs bugzilla) settle out first.

Weak -1 in general. I’m not strongly opposed, but I see very little value in this migration and a lot of headache. Others have explained this well, so I’ll mostly skip that. I particular dislike the assumed fork model; I work in patches, so that’s a ton of overhead process wise.

One exception for me would be docs. If we could open pull requests - and possibly the web-edit tool for folks with commit access? - for the docs subtree I could see a lot of advantage in making those easy for new contributors to update. It might also be a good proving ground for the workflow as a whole.

Philip

Mehdi AMINI via llvm-dev <llvm-dev@lists.llvm.org> writes:

It also feels like this will inevitably force people to use external
tools to handle it, instead of just being able to use Git on its
own (something like `git cl` comes to mind, which some may like and
some may not).

Actually my experience is radically the opposite: Phabricator is the one
that does not allow to "just being able to use Git on its
own", you're either stuck with manually exporting .patch files and manually
add them to the UI or use a php-based CLI indeed (which is the point you
seem to be against).

+1. Phab still seems alien to me and I've been using it for over a
year now.

                     -David

James Henderson via llvm-dev <llvm-dev@lists.llvm.org> writes:

9) On the note of branches for PRs, don't they require users to push their
local branches to the remote repo to create? That means we'll end up
thousands of branches in git. Not sure that this will do performance any
good, and I seem to remember there was general agreement that we didn't
want people to push their branches generally. Yes, in theory branches
should be deleted after they're merged, but I've seen that locally not
happen regularly, and that's even assuming that all PRs get merged in (they
won't).

This is why Mehdi suggested using forks, I believe. The branches only
live on the fork of the project. The main canonical copy remains
branch-free.

                   -David

There are more higher barriers to new contributors than Phabricator.
I also don't understand what's so complex with it?
Other than the necessity to add -U9999 to git diff :wink:

Very strong -1 at the moment. I think we need to let some of the other efforts (e.g. issues vs bugzilla) settle out first.

Sorry I really don’t see a relationship with issues, this seems entirely independent to me. I even actually see this as easier than issues.

Weak -1 in general. I’m not strongly opposed, but I see very little value in this migration and a lot of headache. Others have explained this well, so I’ll mostly skip that.

I am not sure what « headache » your referring to at the moment.

I particular dislike the assumed fork model; I work in patches, so that’s a ton of overhead process wise.

Sorry but pull request and forks are strictly less overhead process wise, especially compared to working with patches.
What you’re writing comes across to me as « not having switched to git yet ».

This is also the reason why we need months of trial for everyone who hasn’t used it extensively before to be able to give it a complete and honest try.

Again I’d be perfectly happy to be ginea pig in our sub-project as well to begin with: it may even bring people to contribute just for trying pull-requests :slight_smile:

David Tellenbach via llvm-dev <llvm-dev@lists.llvm.org> writes:

My -1 holds even more since I personally find the Phabricator workflow
to be very good in particular as it even got easier since arcanist
works perfectly fine with git and the complete arc workflow can be
used since the migration. In contrast to e.g. Bugzilla where we
searched for an alternative since the current situation was bad, this
is not the case for Phabricator.

After days of trying, I gave up trying to get arcanist to work. So it's
all manual diff + cut and paste into a browser for me. That is
significantly worse than simply pushing to a PR.

Regarding the hurdle for new contributors: Is the Github workflow
really easier or is it just the case that more people have Github
accounts?

They're related, right? GitHub is easier because lots of people are
already familiar with it.

                    -David

Actually my experience is radically the opposite: Phabricator is the one
that does not allow to “just being able to use Git on its
own”, you’re either stuck with manually exporting .patch files and manually
add them to the UI or use a php-based CLI indeed (which is the point you
seem to be against).

+1. Phab still seems alien to me and I’ve been using it for over a
year now.

Agreed. I find Phabricator incredibly clunky to use compared to the github pull request interface. It’s probably somewhat of a vicious cycle though, as my perceived overhead of having to generate patch files, upload them and then deal with Phabrictator means that I tend to focus on internal facing tasks over doing upstream work where possible, which means that I grow increasingly comfortable with pull request workflow and grow my perceived barrier with the Phabricator workflow. I’d definitely use github PRs in preference to Phab if a choice between the two were on offer.

I’m a big fan of the “Rebase+Merge” option though. I’d much rather put up a single pull request encompassing multiple related commits (e.g. an initial commit containing some necessary NFC refactoring and a second commit containing the actual functional changes) and then rebase+force push to address review comments. I’ve never personally had an issue with the interface in terms of seeing what’s changed since the previous review. It will be a shame if the only option we get from the web interface is to Squash, although I’d be comfortable in the above example manually cherry-picking and pushing the NFC commit directly into master and then just pulling in the functional side from the web interface if that was all that was available.

-Greg

A strong -1 to moving away from Phab.

+1 to Nemanja’s comment re: the (lack of) quality in the GitHub PR experience.

As for lowering the cost of entry for new developers, I will point out that a bad review experience for reviewers does not improve communication between existing community members and the new community members.

Daniel Sanders via llvm-dev <llvm-dev@lists.llvm.org> writes:

Personally, I'd like us to drop the linear history requirement but I
know there's some strong opposition there. The main reason I'd like us
to accept non-linear history in conjunction with pull requests is that
it means that the commits that were tested (whether by the author or
by CI) still exist in the repository after the merge. If we
rebase+push like we do today or squash+merge as in this proposal, then
the information that a particular commit passed testing (or failed a
particular test) gets lost.

-1 from me. We have one project where we decided to "keep the original
commits" as you say above and another where we enforce linear history.
The linear history project is *much* easier to follow. With merge
commits you start running into issues like foxtrot commits
(Protect our Git Repos, Stop Foxtrots Now! - Atlassian Developer Blog) and the whole
thing easily becomes a mess. There may be various rules/plugins/etc.
that attempt to enforce "good" behavior but in the long run my
experience has been that simpler rules make things simpler.

                     -David

We’re making major changes to workflow and process. The impact on contributors and downstream workflows is substantial. Spreading them out in time a bit gives folks time to react, plan, and object if desired. Any change in tooling of this magnitude is a headache. There’s a period of lost productivity no matter what the end result is. The new tool can be the perfect solution, and there’s still a substantial switching cost. There’s always a period where there’s a net loss in productivity, the only question is how long it is until the new tool’s benefit pays it back. I have used git successfully for several years to manage internal repositories. We’ll have to agree to disagree on your first sentence.

After days of trying, I gave up trying to get arcanist to work. So it's
all manual diff + cut and paste into a browser for me. That is
significantly worse than simply pushing to a PR.

Agreed! But I think this is mainly due to missing documentation rather than the
tool itself. As I've already mentioned we should probably add a short example
arcanist workflow to our doc.

They're related, right? GitHub is easier because lots of people are
already familiar with it.

Yes, certainly related since a bad tool/platform could not get as much
acceptance as Github does. However, I think the sole fact that it has a larger
user base doesn't necessarily make Github the superior tool. What I wanted to
say: Making a Phab account is in my opinion no major problem for new
contributors. Missing documentation and a hard to find/read/understand
contribution guide is (this includes a missing Phab/arc guide).

    David

Philip Reames via llvm-dev <llvm-dev@lists.llvm.org> writes:

Weak -1 in general. I'm not strongly opposed, but I see very little
value in this migration and a lot of headache. Others have explained
this well, so I'll mostly skip that. I particular dislike the assumed
fork model; I work in patches, so that's a ton of overhead process wise.

Can you explain this a bit more? The way I anticipate doing this:

1. Create fork once for all time
2. Set a remote for the fork in my local clone once for all time

while ((I am employed or financially independent) and working on LLVM)
  1. Make a commit
  2. Push to fork remote
  3. Open PR (probably right in emacs using Magit)
  4. Respond to comments, push further commits, squash as needed, etc.
  5. Declare done and request CI/merge
elihw

I would never delete the fork or the fork remote.

One exception for me would be docs. If we could open pull requests -
and possibly the web-edit tool for folks with commit access? - for the
docs subtree I could see a lot of advantage in making those easy for new
contributors to update. It might also be a good proving ground for the
workflow as a whole.

That's a really great idea!

                  -David

-1 to “squash and merge” being the only option if rebase+push (–ff-only) is possible. I’d rather we use our judgement to decide what’s appropriate for the pull request at hand rather than have a blanket rule.

Personally, if I have multiple commits (typically 2-5) in a pull request it’s because there’s incremental steps that I’d like to preserve. For example:

  1. Add assembler support for X

  2. Add codegen support for X

  3. Move X to libY. NFC

  4. Add support for Z
    We could do each step in individual pull requests but it’s unnecessary overhead.

I agree that pull requests add additional overhead, but they can be scripted to reduce it. My primary goal is to achieve these things set out in the developer policy:
http://llvm.org/docs/DeveloperPolicy.html#incremental-development

By making each of these steps explicit PR’s, they get individual testing (when we have integrated presubmit testing, which we’ll hopefully get to), code reviewers are reviewing each atomic step (and not the result of a bunch of walk), etc.

A lot of setups do only test the tip of the PR but there’s nothing really stopping a CI system from testing each commit. FWIW, we don’t test the individual commits in our downstream repo but that’s really only because we have very extensive pre-merge testing and we’d need many more bots to cope with the increased load.

In my opinion, bundling up many small patches into a single large PR subverts a lot of the goals of incremental development. It is better to propose the small patches as individual patches.

I completely agree with the first bit of that. I think we only disagree on the measurement there with me measuring it in in a more subjective/flexible manner than the number of commits.

Very broadly my $0.02.

At the Women in Compilers event before this year’s dev meeting one of the things repeatedly cited as a keeping new contributors away was that LLVM doesn’t use modern or familiar tools. GitHub (love it or hate it) is the place most people look for open source project hosting, and it provides the workflows people are familiar with. I believe we should strive to fit that mold as much as possible (within common sense and good practice) to make developing LLVM resemble patterns that other projects use.

I also feel that it is worth noting GitHub provides a command line interface that can be used to create pull requests, so there is almost no workflow difference to a command line focused user (see: https://github.com/github/hub).

I don’t know what percentage of us use arcanist vs uploading via the website, but the officially supported GitHub command line and UI tools are pretty great.

Lastly, something people often overlook is that you can actually edit files and create pull requests entirely on GitHub’s web interface. This is a huge benefit for people making documentation changes, code comment changes, and other extremely simple changes. The fact that people can edit docs on my phone or tablet without needing to even clone the repository will be a huge reduction in barrier for documentation changes.

If you can’t tell, I’m strongly in favor of enabling pull requests.

-Chris

Daniel Sanders via llvm-dev <llvm-dev@lists.llvm.org> writes:

Personally, I’d like us to drop the linear history requirement but I
know there’s some strong opposition there. The main reason I’d like us
to accept non-linear history in conjunction with pull requests is that
it means that the commits that were tested (whether by the author or
by CI) still exist in the repository after the merge. If we
rebase+push like we do today or squash+merge as in this proposal, then
the information that a particular commit passed testing (or failed a
particular test) gets lost.

-1 from me. We have one project where we decided to “keep the original
commits” as you say above and another where we enforce linear history.
The linear history project is much easier to follow.

FWIW, I can’t say I’ve had any real difficulty following non-linear history. I think that’s mainly because even non-linear history has a linear history in it (the first-parent one that that article is trying to protect). Then in addition to the linear first-parent history you have more detailed linear histories on the side if you want to see them. It’s certainly possible to get into a mess if your pull requests contain merge commits though.

With merge
commits you start running into issues like foxtrot commits
(https://blog.developer.atlassian.com/stop-foxtrots-now) and the whole
thing easily becomes a mess. There may be various rules/plugins/etc.
that attempt to enforce “good” behavior but in the long run my
experience has been that simpler rules make things simpler.

-David

The rule you need to avoid those kinds of mess is essentially the same one as the one we currently have for pushing directly to master: No merge commits in the pull request.
Or to put it another way that accounts for push to master and pull requests: Humans aren’t allowed to make merge commits.

From that article:

You could disable direct pushes to master, and hope that pull-requests never merge with fast-forward.

That’s actually how our downstream repo works except we don’t rely on hope. The server is the one merging pull requests and it’s doing the merge with --no-ff. It’s the equivalent of the ‘Create merge commit’ option in GitHub.

Based on reading GitHub’s documentation (https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/merging-a-pull-request#merging-a-pull-request-on-github) all three of the options for merging pull requests will preserve a good first-parent linear history. GitHub doesn’t ban pushing to master of course and we shouldn’t do that as it will make pull-requests mandatory.