Enable Contributions Through Pull-request For LLVM

Hi all,

Now that we’re on GitHub, we can discuss about pull-requests.
I’d like to propose to enable pull-request on GitHub, as a first step as an experimental channel alongside the existing methods for contributing to LLVM.
This would allow to find workflow issues and address them, and also LLVM contributors in general to start getting familiar with pull-requests without committing to switching to pull-requests immediately. The community should evaluate after a few months what would the next steps be.

GitHub pull-requests is the natural way to contribute to project hosted on GitHub: this feature is so core to GitHub that there is no option to disable it!

The current proposal is to allow to integrate contributions to the LLVM project directly from pull-requests. In particular the exact setup would be the following:

  • Contributors should use their own fork and push a branch in their fork.
  • Reviews can be performed on GitHub. The canonical tools are still the mailing-list and Phabricator: a reviewer can request the review to move to Phabricator.
  • The only option available will be to “squash and merge”. This mode of review matches the closest our current workflow (both phabricator and mailing-list): conceptually independent contributions belongs to separate review threads, and thus separate pull-requests.
    This also allow the round of reviews to not force-push the original branch and accumulate commits: this keeps the contextual history of comments and allow to visualize the incremental diff across revision of the pull-request.
  • Upon “merge” of a pull-request: history is linear and a single commit lands in master after review is completed.

As an alternative staging proposal: we could enable pull-requests only for a small subset of sub-projects in LLVM (i.e. not LLVM/clang to begin with for example) in the repo. In this case, we would propose to begin with the MLIR project (as soon as it gets integrated in the monorepo). This would be a good candidate to be the guinea pig for this process since it does not yet have a wide established community of contributors, and the current contributors are already exclusively using pull-requests.

Here is a more complete doc on the topic: https://docs.google.com/document/d/1DSHQrfydSjoqU9zEnj3rIcds6YN59Jxc37MdiggOyaI

Cheers,

Hi all,

Now that we’re on GitHub, we can discuss about pull-requests.
I’d like to propose to enable pull-request on GitHub, as a first step as an experimental channel alongside the existing methods for contributing to LLVM.
This would allow to find workflow issues and address them, and also LLVM contributors in general to start getting familiar with pull-requests without committing to switching to pull-requests immediately. The community should evaluate after a few months what would the next steps be.

GitHub pull-requests is the natural way to contribute to project hosted on GitHub: this feature is so core to GitHub that there is no option to disable it!

The current proposal is to allow to integrate contributions to the LLVM project directly from pull-requests. In particular the exact setup would be the following:

  • Contributors should use their own fork and push a branch in their fork.
  • Reviews can be performed on GitHub. The canonical tools are still the mailing-list and Phabricator: a reviewer can request the review to move to Phabricator.
  • The only option available will be to “squash and merge”. This mode of review matches the closest our current workflow (both phabricator and mailing-list): conceptually independent contributions belongs to separate review threads, and thus separate pull-requests.
    This also allow the round of reviews to not force-push the original branch and accumulate commits: this keeps the contextual history of comments and allow to visualize the incremental diff across revision of the pull-request.
  • Upon “merge” of a pull-request: history is linear and a single commit lands in master after review is completed.

This sounds great, and +1 for ’squash and merge’ as the only option.

I’m a huge fan of incremental public development, and would prefer not to have people incentivized to do “lots of incremental changes in their local fork, then merge the huge thing” while claiming to abide by the idea of incremental development. Such an approach doesn’t get the CI or review benefits, and eliminating this from the workflow seems like a very clean solution. To be clear, I have no problem with people doing whatever local development policies they prefer, I just think they should be squashed out as an implementation detail when the aggregate patch gets merged.

As an alternative staging proposal: we could enable pull-requests only for a small subset of sub-projects in LLVM (i.e. not LLVM/clang to begin with for example) in the repo. In this case, we would propose to begin with the MLIR project (as soon as it gets integrated in the monorepo). This would be a good candidate to be the guinea pig for this process since it does not yet have a wide established community of contributors, and the current contributors are already exclusively using pull-requests.

Here is a more complete doc on the topic: https://docs.google.com/document/d/1DSHQrfydSjoqU9zEnj3rIcds6YN59Jxc37MdiggOyaI

Overall this plan sounds good to me. Please keep the “how to contribute” doc and the project description (which currently says 'Note: the repository does not accept github pull requests at this moment.’) up to date as processes evolve! :slight_smile:

-Chris

Strong -1 personally.

* What is the endgoal? To fully kill phab and move to github pullrequests?
  it might be best to discuss *that* first. (did i miss an RFC?)
* Separation of attention - does everyone who cares
  now has to also look at pull requests for reviews;
  or should they be exempt from general review attention?
* For me personally after using phabricator, github now seems
  extremely crude, laggy, limited. To name a few:
  * There is no way to see previous version of the patch.
    I don't think there is any way to disable force-push for PR's.
    While this is only 'slightly' limiting for the reviewer,
    this can be more limiting for the author - how do i go back
    to previous revision? I can't, i need to maintain a copy
    of every branch i pushed manually.
  * Impossible to chain reviews - a PR diff can only be made
    on top of git master branch. Any non-trivial change consists of
    separable PR's. Now either one will only be able to submit
    dependent PR after the prereqs are merged, or the diff will be
    impossible to read.
  * Does not load large diffs by default.
    That caught me by surprise several times
    when i was searching for something.
  * No diffs in mail - *super* inconvenient.
    One has to open each pr in browser (or fetch via git etc etc)
  * Github is an official US-based commercial organisation.
    It *has* to follow U.S. export law. In particular i'm thinking of
      https://techcrunch.com/2019/07/29/github-ban-sanctioned-countries/
      https://github.com/tkashkin/GameHub/issues/289
    Does phabricator already have such restrictions, blocks?
    If not, wouldn't you say adding such restrictions is not being
    more open for contributors?
    What happens when, in not so long in the future, the entirety of, say,
    china or russian federation is blocked as such?
  * Same question can be asked about internet "iron" curtains
    certain (*cough*) countries are raising. That also has already happened
    before (and *will* happen again), and i was personally affected:
      https://en.wikipedia.org/wiki/Censorship_of_GitHub#Russia
    I don't recall that happening to phabricator yet.
    I fail to see how that is more contributor-friendly.
  * Not sure anyone cares, but while using github as main git
    repository "mirror" is perfectly fine - git is distributed, only canonical
    write-repo would be affected anything bad happen. But that isn't the case
    for reviews, issues; as it has been discussed in the "let's migrate bugzilla
    to github issues", it is far more involved.
  * What about DMCA? Not sure how this is currently handled.
  * UI feels laggy. Not much to add here, pretty subjective.
  * I'm sure i'm missing a few.

The github does come with it's benefits, sure:
* It is *simpler* to preserve git commit author.
  Currently one has to ask the author for the "Author: e@ma.il" line,
  and do `git commit --amend --author="<>"`.
* @mention is wide-r-reaching - whole github, not just llvm phabricator
* No more "phabricator disk full" issues
* ???

TLDR: such migration lowers the bar for new, first time,
unestabilished contributors, but i personally feel it *significantly*
raises the bar for the existing contributors, reviewers.
We don't live in perfect world. Aspirational goals are aspirational.
They should be attempted to be reached, but they shouldn't shadow and
overweight, take over the main goal of the LLVM project.

Personally, i don't see that benefits out-/over- weight the drawbacks.

Roman.

Hi Mehdi,

I support the idea of running some experiments to gain more insights.

If you’re interested: I can offer to add the pre-merge tests to the github pull-requests as well. This way we can also see what the CI integration in github would look like.

-1 from me as well. GitHub UI feels extremely sluggish with large
projects, forks feel like needless hurdles to jump over,
especially for small patches - having to fork the entire monorepo,
just to do a GitHub PR would be extremely cumbersome.

Just recently I submitted a very tiny PR for PcapPlusPlus on GitHub
which involved:
1). Forking the repository
2). Cloning the fork
3). Applying the patch
4). Pushing it and creating a PR.
5). Deleting the fork (Since I didn't want it cluttering up my account
and I'm not a regular contributor to PcapPlusPlus).

Neither the pull request mechanism nor GitHub issues feel like they're
suitable for large projects since it's common to see large GitHub
projects with ignored issues and worse filtering/categorizing. And PRs
as a review mechanism are far worse than Phabricator or any other
dedicated code review software, and I think a lot of GitHub projects
use internal or separate code review tools as-is, with PRs being
second class citizens and are more prone to getting derailed. It's not
uncommon for GitHub issue or PR discussions to turn into massive
threads that quickly turn into bikeshedding and off-topic conversations.

It also feels like this will inevitably force people to use external
tools to handle it, instead of just being able to use Git on its
own (something like `git cl` comes to mind, which some may like and
some may not).

And Roman raised a lot of good points regarding other, more
bureaucratic aspects of GitHub and it raising the bar for existing
contributors.

Just my two cents in.

Huge +1 from my side.

There are couple of small patches I want to submit, but the fact that I need to deal with phabricator just stops me from doing this.

Looking at the other comments against this step I’m wondering if the goal of the project is to keep existing (dozens?) contributors happy, or to embrace more (hundreds?) people to join the project?

Strong -1 personally.

Likewise, for many of the same reasons detailed below.

~Aaron

Having been using Github internally for code reviews of private patches on LLVM, and Phabricator for upstream ones, I’ve found the latter to be far easier to use. Prior to working with LLVM, I had basically no experience with either, so I’d say I’m coming from a fairly neutral starting point. Here are some of my observations (I want to highlight 9 as a particular issue, due to other recent discussions I’ve seen on the mailing list):

  1. I don’t know of a natural way to chain related patches together on Github (aside from explicitly mentioning them), but as separate reviews. This is useful for bigger features/refactorings where each individual step provides some benefit, but seeing the bigger picture is easy. Phabricator has the child/parent objects option.

  2. Phabricator allows placing comments anywhere in the diff context, which is useful when the commit affects lines that haven’t actually been changed, and those lines need addressing/referencing in some way. Github doesn’t allow comments outside the immediate context around changed files

  3. As a +1 to Github on the other hand, commits always come with full context, so you don’t have to remind people to include it.

  4. Phabricator’s ability to see what has changed since the previous time you commented seems to be much more reliable than what Github provides.

  5. With a Github PR, if you want to include a minor change to the patch prior to committing, you have to commit it to the PR, if you wish to use the UI to do the merging. Of course, you could just not push the patch via Github.

  6. I myself have on numerous occasions messed up my commit message when committing via the Github UI, because it’s not obvious unless you are a seasoned user that the title of the commit appears as the first line of the commit message. This is important when doing a squash and merge.

  7. The PR approach does at least allow committing via the UI, which is perhaps a little less fiddly in some cases.

  8. I rarely bother creating a branch for Phabricator reviews, because I don’t need it (I’m often not working on multiple things at once), so the extra hassle of creating a branch/checking it out etc that Github requires is annoying. On the other hand, Github PRs are generally quite easy to create once you have pushed a branch.

  9. On the note of branches for PRs, don’t they require users to push their local branches to the remote repo to create? That means we’ll end up thousands of branches in git. Not sure that this will do performance any good, and I seem to remember there was general agreement that we didn’t want people to push their branches generally. Yes, in theory branches should be deleted after they’re merged, but I’ve seen that locally not happen regularly, and that’s even assuming that all PRs get merged in (they won’t).

  10. Expanding context on Github is a pain: there is no option to just expand the whole context in a block, which means that if you need to see something much earlier in a large file, you have to click a LOT. Also, the browser view often doesn’t then go where you expect it to from my experience. Phabricator has a “expand all N lines” option.

  11. Small one this one, but missing new lines at end of file are much more obvious on Phabricator than Github.

  12. Not sure if this is a real issue, but Github reviewers are limited in number (I think it’s 15?). To my knowledge, there is no such limit with Phabricator (but then how often do you end up with 15 people marked as reviewers?).

I’m sure I could come up with other points for/against Github PRs. On balance I definitely prefer Phabricator.

James

I know from experience that many people have not filed issues due to bugzilla, and not submitted PRs due to phabricator.

An example of where a completely new contributor (my colleague Aras who contributed the build-cost stats to Clang for 9.0.0) intuitively used GitHub is here https://github.com/aras-p/llvm-project-20170507/pull/2 - which I think really awesomely shows how quick a newcomer to the ecosystem could do something really useful and productive when they are channeled through tools they are already used to (GitHub issues/PRs).

So from us it’s a big +1.

I think that it’s really important that we try to strike some balance here. Based on my experience, this thread, and offline conversations, two things seem clear to me:

  1. Overall, Phabricator is a superior tool for managing code reviews and some related processes (although GitHub’s tools certainly have some benefits, and both are getting better over time).

  2. Not accepting GitHub PRs forms a barrier to entry for casual contributors, and perhaps, causes workflow-integration inefficiencies for some others.

I’m also deeply concerned about having another place to search for historical data, track conversations (including the ability to cross-link), and so on.

I think that we should first explore the solution that Facebook uses or used, see: https://github.com/facebook/hhvm/wiki/What-is-Phabricator

They had an import system that, when a PR was submitted, would import it into Phabriactor and post a reply providing an explanation and a link to the imported differential. This might be the right balance between ease of use for casual contributors and the needs of the more-regular contributors providing the code review (and, likely, those actually committing changes).

Other projects have worked on various kinds of integration schemes (e.g., https://github.com/framawiki/Github-notif-bot) that we should investigate. It seems that the Phabricator developers are also working on some more-general API which better supports this kind of use case (e.g., https://secure.phabricator.com/T12739), although the status seems unclear to me.

-Hal

I find Hal’s analysis of this subject to be the most pragmatic one in the thread and really want to chime in with a big +1 for the balanced approach.

TL; DR;

As it was outlined by previous comments in the thread, the GitHub PR review tool is inferior to Phabricator (and I would argue that this is an objective statement). Those that are only familiar with GitHub may not ascribe much value to capabilities such as commenting on unchanged lines, expanding whatever context you’re interested in, seeing the diff of two revisions, etc. But for a good portion of the LLVM community, these things are important.

I think a well integrated solution would not place undue burden on the author or the reviewer. Allowing pull requests seems to be desired from the perspective of making it easy for authors. Allowing the review to happen on Phabricator seems to be desired from the perspective of making it easy for the reviewers. So a solution that can do both is the best of both worlds.

One comment on the initial proposal is that it seems like what is proposed may take away the supposed benefit of moving to PR’s in the first place - making it easy for the author. I fail to see how the workflow*:

  • Fork

  • Clone

  • Create branch and modify code

  • Push

  • Create PR

  • Squash+merge after approval

  • Delete the fork

Is easier from the current:

  • Clone
  • Create branch and modify code
  • git diff -U99999
  • Post through web UI (or use arcanist for these two steps)
  • Apply the approved version and git push
  • There is a distinct possibility that I am misunderstanding how the new workflow would work and it would be easier than what I outlined above, but that’s just the way I understand it.

I don’t intend to weigh in on either side, but just give a perspective on a few questions asked.

From an outsiders perspective, that list isn’t what I’d typically describe as the workflow, since it makes “fork” and “branch” sound like difficult operations. This sounds akin to thinking that someone would reclone the svn repo before working on a new commit—possible, but rather extreme. The distributed nature of git means that the fork is pretty simple for github (essentially creates a new branch in their database, but that is just defining a new name for an existing commit). I’ve noticed GitHub recently made it easier to delete the fork, but I’d argue most users shouldn’t care. What you’d typically do instead is to do git remote add <name> <url> and/or use their hub CLI wrapper (https://hub.github.com) to synchronize work between multiple remote and local locations.

On the note of branches for PRs, don’t they require users to push their local branches to the remote repo to create? That means we’ll end up thousands of branches in git.

Yes, but a branch is just a name for a particular list of commits. Unlike svn, it is not the commits themselves, so they have almost no performance implications. I don’t think LLVM would want everyone pushing to their work in progress to the central repo, so the “thousands of branches” would be scattered across each individual contributor’s repo, and would represent an likely inconsequential fraction of the number of branches already on GitHub.

Impossible to chain reviews - a PR diff can only be made on top of git master branch
It is possible to make a PR against someone else’s PR on their fork, although I hesitate to compare this to the far more usable phabricator capability, so perhaps “impossible” is the right term anyways.

By contrast though, GitHub has no intrinsic concept of a “master” branch. Even aside from the question of using GitHub PRs, you can ask GitHub to make a diff between any two (or three) points in time in a few ways, such as you could also do locally with git. For example, this’ll show you the last two commits: https://github.com/llvm/llvm-project/compare/HEAD^^…master (two dots for a direct diff, three for showing only changes since the branch point). There’s also some convenient additional tooling such as adding “.patch” will download it as a raw patch file instead showing in the rich editor.

There is no way to see previous version of the patch.
This is a fairly new feature, but the PR itself now lists these as events (e.g. “user force-pushed from abcdefg to gfedcba”, where each of those are hyperlinks) in the comment feed.

how do i go back to previous revision
Answering a slightly different question in the hopes someone finds this information useful: locally you can do git reflog <branchname> to get a history of the state of the HEAD of any tag on that particular machine for the last 90 days.

-1 from me as well. GitHub UI feels extremely sluggish with large
projects, forks feel like needless hurdles to jump over,
especially for small patches - having to fork the entire monorepo,
just to do a GitHub PR would be extremely cumbersome.

Just recently I submitted a very tiny PR for PcapPlusPlus on GitHub
which involved:
1). Forking the repository
2). Cloning the fork
3). Applying the patch
4). Pushing it and creating a PR.
5). Deleting the fork (Since I didn’t want it cluttering up my account
and I’m not a regular contributor to PcapPlusPlus).

I am not sure I understand why you went through all these steps. If you had a patch in the first place, you likely had already a local clone of the repo? At this point you need to create a fork indeed (either through a click on the canonical repo or using GitHub command line utility), and then you can just push your current HEAD to your fork (after adding it locally as a remote). This kind of manipulation can be strange coming from SVN but are extremely common working on git/GitHub. Assuming you have a clone of the PcapPlusPlus project locally this is just a matter of:

$ git remote add fork git@github.com:christinaa/PcapPlusPlus.git
$ git checkout -b my_branch
$ git push fork my_branch

Iterating on the review can be done by just committing more into the branch and pushing, the PR is automatically updated.

Neither the pull request mechanism nor GitHub issues feel like they’re
suitable for large projects since it’s common to see large GitHub
projects with ignored issues and worse filtering/categorizing. And PRs
as a review mechanism are far worse than Phabricator or any other
dedicated code review software, and I think a lot of GitHub projects
use internal or separate code review tools as-is, with PRs being
second class citizens and are more prone to getting derailed. It’s not
uncommon for GitHub issue or PR discussions to turn into massive
threads that quickly turn into bikeshedding and off-topic conversations.

It also feels like this will inevitably force people to use external
tools to handle it, instead of just being able to use Git on its
own (something like git cl comes to mind, which some may like and
some may not).

Actually my experience is radically the opposite: Phabricator is the one that does not allow to “just being able to use Git on its
own”, you’re either stuck with manually exporting .patch files and manually add them to the UI or use a php-based CLI indeed (which is the point you seem to be against).

Best,

Thanks James, that is an excellent list!

Something that came up in a discussion with Swift folks about their choice of not using GitHub issues originally is that over time GitHub issues added the missing features and they regret now not being on GitHub issues. The same may apply to pull-request though: I think there are some PMs inside GitHub who try to add features to the roadmap to make GitHub better for projects like LLVM. For instance until a few weeks before our move to GitHub there wasn’t a feature to preserve linear history and they added it right before our move. So I suspect history can repeat as well here, I don’t know what are true blockers to try it, and how much we can expect from GitHub here, but we should definitely engage with them and get them to improve the “minus” you noticed against Phab (and other review systems).

Best,

I don't intend to weigh in on either side, but just give a perspective on a few questions asked.

From an outsiders perspective, that list isn't what I'd typically describe as the workflow, since it makes "fork" and "branch" sound like difficult operations. This sounds akin to thinking that someone would reclone the svn repo before working on a new commit—possible, but rather extreme. The distributed nature of git means that the fork is pretty simple for github (essentially creates a new branch in their database, but that is just defining a new name for an existing commit).

FWIW, I've not found it to be so straightforward in practice. If you have multiple LLVM forks, representing different distinct projects, because GitHub only lets you have one fork per account, the process via which you can work with multiple derived projects is actually quite annoying.

GitHub has a page on this (https://github.community/t5/Support-Protips/Alternatives-to-forking-into-the-same-account/ba-p/7428), but all of the work-arounds are pretty bad.

In short, only the first fork is easy. After that, it gets harder.

-Hal

I've noticed GitHub recently made it easier to delete the fork, but I'd argue most users shouldn't care. What you'd typically do instead is to do `git remote add <name> <url>` and/or use their `hub` CLI wrapper (https://hub.github.com) to synchronize work between multiple remote and local locations.

On the note of branches for PRs, don't they require users to push their local branches to the remote repo to create? That means we'll end up thousands of branches in git.

Yes, but a branch is just a name for a particular list of commits. Unlike svn, it is not the commits themselves, so they have almost no performance implications. I don't think LLVM would want everyone pushing to their work in progress to the central repo, so the "thousands of branches" would be scattered across each individual contributor's repo, and would represent an likely inconsequential fraction of the number of branches already on GitHub.

Impossible to chain reviews - a PR diff can only be made on top of git master branch

It is possible to make a PR against someone else's PR on their fork, although I hesitate to compare this to the far more usable phabricator capability, so perhaps "impossible" is the right term anyways.
By contrast though, GitHub has no intrinsic concept of a "master" branch. Even aside from the question of using GitHub PRs, you can ask GitHub to make a diff between any two (or three) points in time in a few ways, such as you could also do locally with git. For example, this'll show you the last two commits: https://github.com/llvm/llvm-project/compare/HEAD^^...master<https://github.com/llvm/llvm-project/compare/HEAD^...master> (two dots for a direct diff, three for showing only changes since the branch point). There's also some convenient additional tooling such as adding ".patch" will download it as a raw patch file instead showing in the rich editor.

There is no way to see previous version of the patch.

This is a fairly new feature, but the PR itself now lists these as events (e.g. "user force-pushed from abcdefg to gfedcba", where each of those are hyperlinks) in the comment feed.

how do i go back to previous revision

Answering a slightly different question in the hopes someone finds this information useful: locally you can do `git reflog <branchname>` to get a history of the state of the HEAD of any tag on that particular machine for the last 90 days.

I find Hal's analysis of this subject to be the most pragmatic one in the thread and really want to chime in with a big +1 for the balanced approach.

TL; DR;
As it was outlined by previous comments in the thread, the GitHub PR review tool is inferior to Phabricator (and I would argue that this is an objective statement). Those that are only familiar with GitHub may not ascribe much value to capabilities such as commenting on unchanged lines, expanding whatever context you're interested in, seeing the diff of two revisions, etc. But for a good portion of the LLVM community, these things are important.

I think a well integrated solution would not place undue burden on the author or the reviewer. Allowing pull requests seems to be desired from the perspective of making it easy for authors. Allowing the review to happen on Phabricator seems to be desired from the perspective of making it easy for the reviewers. So a solution that can do both is the best of both worlds.

One comment on the initial proposal is that it seems like what is proposed may take away the supposed benefit of moving to PR's in the first place - making it easy for the author. I fail to see how the workflow*:
- Fork
- Clone
- Create branch and modify code
- Push
- Create PR
- Squash+merge after approval
- Delete the fork

Is easier from the current:
- Clone
- Create branch and modify code
- git diff -U99999
- Post through web UI (or use arcanist for these two steps)
- Apply the approved version and git push

* There is a distinct possibility that I am misunderstanding how the new workflow would work and it would be easier than what I outlined above, but that's just the way I understand it.

I think that it's really important that we try to strike some balance here. Based on my experience, this thread, and offline conversations, two things seem clear to me:

1. Overall, Phabricator is a superior tool for managing code reviews and some related processes (although GitHub's tools certainly have some benefits, and both are getting better over time).

2. Not accepting GitHub PRs forms a barrier to entry for casual contributors, and perhaps, causes workflow-integration inefficiencies for some others.

I'm also deeply concerned about having another place to search for historical data, track conversations (including the ability to cross-link), and so on.

I think that we should first explore the solution that Facebook uses or used, see: https://github.com/facebook/hhvm/wiki/What-is-Phabricator

They had an import system that, when a PR was submitted, would import it into Phabriactor and post a reply providing an explanation and a link to the imported differential. This might be the right balance between ease of use for casual contributors and the needs of the more-regular contributors providing the code review (and, likely, those actually committing changes).

Other projects have worked on various kinds of integration schemes (e.g., https://github.com/framawiki/Github-notif-bot) that we should investigate. It seems that the Phabricator developers are also working on some more-general API which better supports this kind of use case (e.g., https://secure.phabricator.com/T12739), although the status seems unclear to me.

-Hal

Having been using Github internally for code reviews of private patches on LLVM, and Phabricator for upstream ones, I've found the latter to be far easier to use. Prior to working with LLVM, I had basically no experience with either, so I'd say I'm coming from a fairly neutral starting point. Here are some of my observations (I want to highlight 9 as a particular issue, due to other recent discussions I've seen on the mailing list):

1) I don't know of a natural way to chain related patches together on Github (aside from explicitly mentioning them), but as separate reviews. This is useful for bigger features/refactorings where each individual step provides some benefit, but seeing the bigger picture is easy. Phabricator has the child/parent objects option.
2) Phabricator allows placing comments anywhere in the diff context, which is useful when the commit affects lines that haven't actually been changed, and those lines need addressing/referencing in some way. Github doesn't allow comments outside the immediate context around changed files
3) As a +1 to Github on the other hand, commits always come with full context, so you don't have to remind people to include it.
4) Phabricator's ability to see what has changed since the previous time you commented seems to be much more reliable than what Github provides.
5) With a Github PR, if you want to include a minor change to the patch prior to committing, you have to commit it to the PR, if you wish to use the UI to do the merging. Of course, you could just not push the patch via Github.
6) I myself have on numerous occasions messed up my commit message when committing via the Github UI, because it's not obvious unless you are a seasoned user that the title of the commit appears as the first line of the commit message. This is important when doing a squash and merge.
7) The PR approach does at least allow committing via the UI, which is perhaps a little less fiddly in some cases.
8) I rarely bother creating a branch for Phabricator reviews, because I don't need it (I'm often not working on multiple things at once), so the extra hassle of creating a branch/checking it out etc that Github requires is annoying. On the other hand, Github PRs are generally quite easy to create once you have pushed a branch.
9) On the note of branches for PRs, don't they require users to push their local branches to the remote repo to create? That means we'll end up thousands of branches in git. Not sure that this will do performance any good, and I seem to remember there was general agreement that we didn't want people to push their branches generally. Yes, in theory branches should be deleted after they're merged, but I've seen that locally not happen regularly, and that's even assuming that all PRs get merged in (they won't).
10) Expanding context on Github is a pain: there is no option to just expand the whole context in a block, which means that if you need to see something much earlier in a large file, you have to click a LOT. Also, the browser view often doesn't then go where you expect it to from my experience. Phabricator has a "expand all N lines" option.
11) Small one this one, but missing new lines at end of file are much more obvious on Phabricator than Github.
12) Not sure if this is a real issue, but Github reviewers are limited in number (I think it's 15?). To my knowledge, there is no such limit with Phabricator (but then how often do you end up with 15 people marked as reviewers?).

I'm sure I could come up with other points for/against Github PRs. On balance I definitely prefer Phabricator.

James

Strong -1 personally.

Likewise, for many of the same reasons detailed below.

~Aaron

I’m not sure I understand the need for that, although that’s perhaps just my lack of experience with certain types of workflows. And seems somewhat more relevant to the choice to switch to git than whether to use GitHub PR workflow. As that page says, each fork supports an unlimited number of branches, each of which is essentially equivalent to a full fork. So you can have multiple clones locally each working from a different branch of the same upstream fork. That’s less true if you are using the issue tracker or PR features, but that seems like it’d be a relatively small set of teams that wanted those extra features.

What would this look like in Phabricator today? My impression has been that the approximate equivalent was manually copying around patch files. Or perhaps setting up a new svn remote copy?

Thinking back on my own transition from svn to git, perhaps one of the points of confusion is in the way to use the respective tools. In svn, it’s usually necessary to do most work directly on trunk. In git, it’s much better to always create feature branches. The git tool won’t stop you from committing to the master branch directly (it’s the default), but I’d say that git will keep trying to make life hard for you if you do. Since git will keep trying to synchronize the meaning of master across all remotes, it can keep creating conflicts with work-in-progress or old work. I don’t know if there’s any resources warning you not to make a GitHub PR from master on your fork (or between any other two branches with the same name); this is just something I have learned from experience to avoid.

I didn’t quite get why you need multiple forks for a given canonical repository? Maybe you’re trying to setup a non-LLVM sub-project being developed in a fork of the LLVM repo?

Thanks,

Hi all,

from me also a big -1 due to the many reasons others have already mentioned (especially Roman).

I want to give an additional consideration: One thing I always liked at the Phabricator workflow is that it is based on diffs only. Pull requests are no git but a Github feature. That means that by adopting Github’s pull request workflow you essentially adopt a proprietary feature that will only work on this specific proprietary platform. Phabricator is essentially just a convenient tool to review plain diffs (first class git citizens) and the fact that it works relatively independent of the underlying repository is something I really like.

My -1 holds even more since I personally find the Phabricator workflow to be very good in particular as it even got easier since arcanist works perfectly fine with git and the complete arc workflow can be used since the migration. In contrast to e.g. Bugzilla where we searched for an alternative since the current situation was bad, this is not the case for Phabricator.

Regarding the hurdle for new contributors: Is the Github workflow really easier or is it just the case that more people have Github accounts? As I’ve already said I think the current workflow is absolutely fine and really intuitive. We should however make our contribution guide more welcoming and complete (I know, there are people working on this which is awesome!).

David

I don't intend to weigh in on either side, but just give a perspective on a few questions asked.

From an outsiders perspective, that list isn't what I'd typically describe as the workflow, since it makes "fork" and "branch" sound like difficult operations. This sounds akin to thinking that someone would reclone the svn repo before working on a new commit—possible, but rather extreme. The distributed nature of git means that the fork is pretty simple for github (essentially creates a new branch in their database, but that is just defining a new name for an existing commit).

FWIW, I've not found it to be so straightforward in practice. If you have multiple LLVM forks, representing different distinct projects, because GitHub only lets you have one fork per account, the process via which you can work with multiple derived projects is actually quite annoying.

I didn't quite get why you need multiple forks for a given canonical repository? Maybe you're trying to setup a non-LLVM sub-project being developed in a fork of the LLVM repo?

Yes. If I have one project on optimizations of accelerator constructs of LLVM, and I have another project on adding quantum-computing extensions to LLVM, and a third project on an LLVM-based high-level-synthesis tool, then I want three separate forks. When someone getting these repositories runs 'git branch -a', they should only see branches related to the project on which they're working, and each repository should contain only data files and history related to that project (which might be large). In addition, each of these projects needs to have its own wiki pages, track its own bugs, etc. To put it another way, when someone grabs some fork on LLVM on which I'm working, it's not "Hal's LLVM that happens to have a branch that does X", but rather, "A named fork of LLVM for a particular project". Moreover, any of these projects might have changes that we want to pull into a branch for upstream contribution.

-Hal

Thanks,

+1 to the pull request model in general. The pull request model is the more conventional way of using git in my experience and I’m generally in favour of a conventional workflow. I can’t really assess the quality of GitHub’s code review tools though as I haven’t used theirs for several years. I’m willing to try them on a significant patch but I’d like to do that before I +1 for GitHub specifically or decide I prefer to keep Phabricator in the process. That said, I think it’s ok to allow them and see if the community shifts to them in the long run. That’s pretty much how our usage of Phabricator took off after all.

Hi all,

Now that we’re on GitHub, we can discuss about pull-requests.
I’d like to propose to enable pull-request on GitHub, as a first step as an experimental channel alongside the existing methods for contributing to LLVM.
This would allow to find workflow issues and address them, and also LLVM contributors in general to start getting familiar with pull-requests without committing to switching to pull-requests immediately. The community should evaluate after a few months what would the next steps be.

GitHub pull-requests is the natural way to contribute to project hosted on GitHub: this feature is so core to GitHub that there is no option to disable it!

The current proposal is to allow to integrate contributions to the LLVM project directly from pull-requests. In particular the exact setup would be the following:

  • Contributors should use their own fork and push a branch in their fork.
  • Reviews can be performed on GitHub. The canonical tools are still the mailing-list and Phabricator: a reviewer can request the review to move to Phabricator.
  • The only option available will be to “squash and merge”. This mode of review matches the closest our current workflow (both phabricator and mailing-list): conceptually independent contributions belongs to separate review threads, and thus separate pull-requests.
    This also allow the round of reviews to not force-push the original branch and accumulate commits: this keeps the contextual history of comments and allow to visualize the incremental diff across revision of the pull-request.

-1 to “squash and merge” being the only option if rebase+push (–ff-only) is possible. I’d rather we use our judgement to decide what’s appropriate for the pull request at hand rather than have a blanket rule.

Personally, if I have multiple commits (typically 2-5) in a pull request it’s because there’s incremental steps that I’d like to preserve. For example:

  1. Add assembler support for X

  2. Add codegen support for X

  3. Move X to libY. NFC

  4. Add support for Z
    We could do each step in individual pull requests but it’s unnecessary overhead.

on the other hand, something like:

  1. Add support for X
  2. clang-format
  3. typo
  4. another typo
  5. fix the build
  6. wip
  7. Fix test failure in target Y
    should be tidied up before merging (squashed in this case but rebase -i is sometimes appropriate).
  • Upon “merge” of a pull-request: history is linear and a single commit lands in master after review is completed.

Personally, I’d like us to drop the linear history requirement but I know there’s some strong opposition there. The main reason I’d like us to accept non-linear history in conjunction with pull requests is that it means that the commits that were tested (whether by the author or by CI) still exist in the repository after the merge. If we rebase+push like we do today or squash+merge as in this proposal, then the information that a particular commit passed testing (or failed a particular test) gets lost.