Allowing PRs on GitHub for some subprojects

Hi,

I know there has been significant discussion about "moving" from Phabricator to GitHub reviews and pull requests, etc. I'm not suggesting that we do anything in terms of global LLVM policy. However, as a maintainer of libc++, I commit __a lot__ of other people's code for them. It would be a huge time saver for me if I could nicely suggest to contributors (not force them) to use PRs instead of Phabricator for their contributions. It would also handle commit attribution properly, which is a pain right now.

Would it be possible to allow GitHub PRs to be submitted on the monorepo so as to let individual sub-projects deal with it however they please? I've spoken to numerous people involved in libc++ development and they would like to start submitting PRs (and for the others, we'll still accept Phabricator reviews). Perhaps it is possible to setup some kind of filter such that PRs touching only libcxx/ and libcxxabi/ can be submitted, but otherwise they're closed by the bot?

Cheers,
Louis

Hi,

This was part of what I proposed when we integrated MLIR in LLVM ( http://lists.llvm.org/pipermail/llvm-dev/2019-November/136579.html ), we were already using pull-requests before and had CI infrastructure already built for it.
Ultimately we didn’t press this further, divergence inside the repo / between the subproject does not seems desirable in my opinion:

  • it creates confusion: “why is this repository having pull-requests and reviews on GitHub but my pull-request gets automatically closed?”, “I followed doc X”
  • the lack of Herald on GitHub makes it so that we can’t filter / subscribe automatically to individual pull-requests: this was the major blocker in my opinion, I couldn’t find a solution to this and I believe it is critical.
  • it does not favor to build common tooling: the recent work on enabling pre-submit CI tests on Phabricator is valuable and I’m looking forward to get this extended. But splitting the various ways of contributing to the repo just means more infrastructure to build to sustain this kind of efforts. (the infrastructure is easier built on GitHub by the way, but that is an argument in favor of migrating from Phab to GH for the full-project).

So in summary: I’d rather find a path for the full-project to do this, but acknowledging that there are few blockers to solve before getting there (cf other threads on the topic).

I know there has been significant discussion about "moving" from
Phabricator to GitHub reviews and pull requests, etc. I'm not
suggesting that we do anything in terms of global LLVM policy.
However, as a maintainer of libc++, I commit __a lot__ of other
people's code for them. It would be a huge time saver for me if I
could nicely suggest to contributors (not force them) to use PRs
instead of Phabricator for their contributions. It would also handle
commit attribution properly, which is a pain right now.

Don't take this as me telling you it is "actually simple". I am
interested what about the contribution is problematic? If the libc++
system doesn't have more requirements than the rest of LLVM there might
be ways to make it less painful. FWIW, here is what I do, and I know not
everyone wants to use `arc`. Ina script this could potentially reduce
the pain. Again, this is not meant to tell you it is simple or your
problems are not real.

arc patch DXXXX
git pull --rebase origin master
arc amend
arcfilter // see below
git llvm push master

arcfilter () { git log -1 --pretty=%B | awk '/Reviewers:|Subscribers:/{p=1} /Reviewed By:|Differential Revision:/{p=0} !p && !/^Summary:/' | git commit --amend -F - }

Would it be possible to allow GitHub PRs to be submitted on the
monorepo so as to let individual sub-projects deal with it however
they please? I've spoken to numerous people involved in libc++
development and they would like to start submitting PRs (and for the
others, we'll still accept Phabricator reviews). Perhaps it is
possible to setup some kind of filter such that PRs touching only
libcxx/ and libcxxabi/ can be submitted, but otherwise they're closed
by the bot?

TBH, I feel this is yet another way of splitting the community and in
the end complicating things even more. I mean, since recently if you
want to ask a question there were the *-dev lists and the IRC. Now we
have discourse, discord on top of that with some people monitoring only
one of these and others required to monitor both. Duplicating the way we
do reviews is similarly going to require people that want to be informed
to duplicate their lookups.

Cheers,
  Johannes

I agree with Johannes here. Although I am one of the (many) people who would love to see us move from Phabricator to GitHub PRs, I think it is super important that we do the transition all at once to keep the LLVM community together. I’m already concerned about the fragmentation the discourse server is causing, e.g. MLIR not using a -dev list. I’d rather the community processes stay consistent.

-Chris

Hi Louis,

I think this is a good idea. We should start with some local experiments where people are willing to try it and figure out how well that works and what does not. Why not allow this for “not significant” changes? They are merged without review today, so we could do them with reviews (and automated tests) via pull requests instead.

@Mehdi

  • it does not favor to build common tooling: the recent work on enabling pre-submit CI tests on Phabricator is valuable and I’m looking forward to get this extended. But splitting the various ways of contributing to the repo just means more infrastructure to build to sustain this kind of efforts. (the infrastructure is easier built on GitHub by the way, but that is an argument in favor of migrating from Phab to GH for the full-project).

Oh I’m happy to add Github support as soon as someone switches on PRs. This is soooooo much easier to set up and maintain than the Phabricator integration. And we already have builds for the release branch (https://buildkite.com/llvm-project/llvm-release-builds) anyway. So we could easily scale that up. And we can only get pre-merge testing on Phabricator to a certain point, as it’s not triggering builds for ~50% of the code reviews.

@Chris Lattner

Although I am one of the (many) people who would love to see us move from Phabricator to GitHub PRs, I think it is super important that we do the transition all at once to keep the LLVM community together. I’m already concerned about the fragmentation the discourse server is causing, e.g. MLIR not using a -dev list. I’d rather the community processes stay consistent.

Please allow me to disagree there. IMHO we’re way too large and diverse of a project to do binary, overnight transitions. We’re also too large to follow a one-size-fits-all approach. If we agree, Github PRs are the right glow, why take this step-by-step. We should have something like a list of important and supported use cases/interactions for the infrastructure. Then we could start working on them one-by-one and figure out if/how they could be implemented on Github and how we could do a smooth transition between these.

If Herald rules are important: Find a way to implement something similar for Github. Maybe there is even a market for such a tool.
If transparency is the problem: Find a way to mirror PRs into Phabricator, so people can at least see them there.
We’re not restricted to community contributions there. We can also pay someone to build the things we need.

Hi Louis,

I think this is a good idea. We should start with some local experiments where people are willing to try it and figure out how well that works and what does not. Why not allow this for “not significant” changes? They are merged without review today, so we could do them with reviews (and automated tests) via pull requests instead.

I still feel this is only a recipe for confusion if “some” pull-requests are accepted on Github but not all. So -1 from me on this.

@Mehdi

  • it does not favor to build common tooling: the recent work on enabling pre-submit CI tests on Phabricator is valuable and I’m looking forward to get this extended. But splitting the various ways of contributing to the repo just means more infrastructure to build to sustain this kind of efforts. (the infrastructure is easier built on GitHub by the way, but that is an argument in favor of migrating from Phab to GH for the full-project).

Oh I’m happy to add Github support as soon as someone switches on PRs. This is soooooo much easier to set up and maintain than the Phabricator integration. And we already have builds for the release branch (https://buildkite.com/llvm-project/llvm-release-builds) anyway. So we could easily scale that up. And we can only get pre-merge testing on Phabricator to a certain point, as it’s not triggering builds for ~50% of the code reviews.

@Chris Lattner

Although I am one of the (many) people who would love to see us move from Phabricator to GitHub PRs, I think it is super important that we do the transition all at once to keep the LLVM community together. I’m already concerned about the fragmentation the discourse server is causing, e.g. MLIR not using a -dev list. I’d rather the community processes stay consistent.

Please allow me to disagree there. IMHO we’re way too large and diverse of a project to do binary, overnight transitions.

You seem to be arguing the “how to transition” while there is no agreement on a transition happening in the first place.

We’re also too large to follow a one-size-fits-all approach. If we agree,

I don’t: we went with a monorepo because we believed that the one-size-fits-all would be more beneficial than splitting, both in terms of infrastructure, but also in terms of the practices of the community, etc.

Github PRs are the right glow, why take this step-by-step. We should have something like a list of important and supported use cases/interactions for the infrastructure. Then we could start working on them one-by-one and figure out if/how they could be implemented on Github and how we could do a smooth transition between these.

If Herald rules are important: Find a way to implement something similar for Github. Maybe there is even a market for such a tool.
If transparency is the problem: Find a way to mirror PRs into Phabricator, so people can at least see them there.
We’re not restricted to community contributions there. We can also pay someone to build the things we need.

One aspect here though is that we can pay someone to build the things we need in Phabricator, we can’t change GitHub though.
It was mentioned in the past that we should engage with GitHub and see if they would add the feature we’re missing to their roadmap, if it hasn’t been done I’d start there: building up this list of things that need to happens before we can agree towards a transition, and engaging with GitHub to have these.

One aspect here though is that we can pay someone to build the things we need in Phabricator, we can't change GitHub though.
It was mentioned in the past that we should engage with GitHub and see if they would add the feature we're missing to their roadmap, if it hasn't been done I'd start there: building up this list of things that need to happens before we can agree towards a transition, and engaging with GitHub to have these.

This is what we're doing already, however, we cannot force them to
implement something that is not on their roadmap as well as control
the ETA of the features.

That said, many of the things were requested (with the justification,
why we need them and how it will benefit not LLVM, but many
open-source projects). Some features were implemented (e.g. forced
linear history) some is still planned with something like "ETA
sometime this year".

FWIW I’m with Mehdi here.

I'm also with Mehdi on this one.

~Aaron

I know there has been significant discussion about "moving" from
Phabricator to GitHub reviews and pull requests, etc. I'm not
suggesting that we do anything in terms of global LLVM policy.
However, as a maintainer of libc++, I commit __a lot__ of other
people's code for them. It would be a huge time saver for me if I
could nicely suggest to contributors (not force them) to use PRs
instead of Phabricator for their contributions. It would also handle
commit attribution properly, which is a pain right now.

Don't take this as me telling you it is "actually simple". I am
interested what about the contribution is problematic? If the libc++
system doesn't have more requirements than the rest of LLVM there might
be ways to make it less painful. FWIW, here is what I do, and I know not
everyone wants to use `arc`. Ina script this could potentially reduce
the pain. Again, this is not meant to tell you it is simple or your
problems are not real.

arc patch DXXXX
git pull --rebase origin master
arc amend
arcfilter // see below
git llvm push master

arcfilter () { git log -1 --pretty=%B | awk '/Reviewers:|Subscribers:/{p=1} /Reviewed By:|Differential Revision:/{p=0} !p && !/^Summary:/' | git commit --amend -F - }

Thanks, this indeed solves some of my problems, however not entirely. When people submit contributions without an email address, I still have to do some digging to find out how to attribute the change. This shouldn't be something I even have to think about.

Louis

`arc patch` should preserve the author information in the original commit, if there was any. At least it has in my experience.

    >
    >> I know there has been significant discussion about "moving" from
    >> Phabricator to GitHub reviews and pull requests, etc. I'm not
    >> suggesting that we do anything in terms of global LLVM policy.
    >> However, as a maintainer of libc++, I commit __a lot__ of other
    >> people's code for them. It would be a huge time saver for me if I
    >> could nicely suggest to contributors (not force them) to use PRs
    >> instead of Phabricator for their contributions. It would also handle
    >> commit attribution properly, which is a pain right now.
    >
    > Don't take this as me telling you it is "actually simple". I am
    > interested what about the contribution is problematic? If the libc++
    > system doesn't have more requirements than the rest of LLVM there might
    > be ways to make it less painful. FWIW, here is what I do, and I know not
    > everyone wants to use `arc`. Ina script this could potentially reduce
    > the pain. Again, this is not meant to tell you it is simple or your
    > problems are not real.
    >
    > arc patch DXXXX
    > git pull --rebase origin master
    > arc amend
    > arcfilter // see below
    > git llvm push master
    >
    >
    > arcfilter () { git log -1 --pretty=%B | awk '/Reviewers:|Subscribers:/{p=1} /Reviewed By:|Differential Revision:/{p=0} !p && !/^Summary:/' | git commit --amend -F - }
    
    Thanks, this indeed solves some of my problems, however not entirely. When people submit contributions without an email address, I still have to do some digging to find out how to attribute the change. This shouldn't be something I even have to think about.
    
    Louis

`arc patch` should preserve the author information in the original commit, if there was any. At least it has in my experience.

Yes, but I think people can upload raw patches to Phabricator without using `arc diff`. I know I ran into one of these just last week where I used Johannes' script (thanks!) and ended up still having to find the committer's email by other means.

Louis

Ah, that's a fair point. Yeah, that's unfortunate.

    >
    > `arc patch` should preserve the author information in the original commit, if there was any. At least it has in my experience.
    
    Yes, but I think people can upload raw patches to Phabricator without using `arc diff`. I know I ran into one of these just last week where I used Johannes' script (thanks!) and ended up still having to find the committer's email by other means.
    
    Louis

I’m one of those people :wink:

-eric

I’m one of those people :wink:

That’s not something to be proud of if you expect a maintainer to commit on your behalf. If you commit yourself, then whatever.

Louis

I'm one of those people :wink:

That's not something to be proud of if you expect a maintainer to commit on your behalf. If you commit yourself, then whatever.

FWIW, I'm also one of those people. :wink: I don't think that pride needs
to factor into it -- not everyone uses arc and that's okay. I push a
lot of patches on behalf of others and have only run into one
situation where it wasn't immediately obvious who to attribute a
non-arc patch to. Asking the author for how they wanted to be
attributed was painless and sufficient.

~Aaron

I'm one of those people :wink:

That's not something to be proud of if you expect a maintainer to commit on your behalf. If you commit yourself, then whatever.

FWIW, I'm also one of those people. :wink: I don't think that pride needs
to factor into it -- not everyone uses arc and that's okay. I push a
lot of patches on behalf of others and have only run into one
situation where it wasn't immediately obvious who to attribute a
non-arc patch to. Asking the author for how they wanted to be
attributed was painless and sufficient.

Having to guess what email address to use is not viable. For example, should it be their work or their personal address? The maintainer shouldn't have to choose that. And asking for which address to use is just a waste of time when the contribution could have been attributed correctly in the first place by the author. It might be okay when you commit a few patches on people's behalf, but when you do a couple per day, it really takes its toll on productivity for something so simply solved.

I'll just start requesting that changes be properly attributed in the first place if I am to commit them, and that should solve this specific problem.

Louis

I’m one of those people :wink:

That’s not something to be proud of if you expect a maintainer to commit on your behalf. If you commit yourself, then whatever.

FWIW, I’m also one of those people. :wink: I don’t think that pride needs
to factor into it – not everyone uses arc and that’s okay. I push a
lot of patches on behalf of others and have only run into one
situation where it wasn’t immediately obvious who to attribute a
non-arc patch to. Asking the author for how they wanted to be
attributed was painless and sufficient.

There’s no pride here for sure - I’m not even sure where you got that. That said I’m in complete agreement with Aaron here. It just hasn’t been an issue.

I’m one of those people :wink:

That’s not something to be proud of if you expect a maintainer to commit on your behalf. If you commit yourself, then whatever.

FWIW, I’m also one of those people. :wink: I don’t think that pride needs
to factor into it – not everyone uses arc and that’s okay. I push a
lot of patches on behalf of others and have only run into one
situation where it wasn’t immediately obvious who to attribute a
non-arc patch to. Asking the author for how they wanted to be
attributed was painless and sufficient.

There’s no pride here for sure - I’m not even sure where you got that. That said I’m in complete agreement with Aaron here. It just hasn’t been an issue.

As discussed on IRC, it turns out the Developer policy actually has a section on that for new committers 1:

Prior to obtaining commit access, it is common practice to request that someone with commit access commits on your behalf. When doing so, please provide the name and email address you would like to use in the Author property of the commit.

So, I’ll just point contributors to that sentence when attribution isn’t obvious.

Louis

Mehdi, Chris & others,

I guess I did not express the main reasons for wanting to switch over very well in my original message. Like Christian talked about, for me it’s all about pre-commit testing. I believe pre-commit testing is a widely shared desire among this community. However, how badly it is missed depends on sub-projects, because they have different realities. For example, in libc++:

  1. We have a lot of first-time contributors, which means that the maintainers end up shepherding many contributions. In particular, this often means fixing small breakage following their changes, which can be difficult for them because they can’t reproduce the failures locally, and they might not even know where to look. While these contributors can submit valuable improvements and bug fixes, we can’t expect them to fix every last platform that we support in the current state of things – it’s hard, it’s boring, and it’s stressful.

  2. Our testing matrix is very large, and interactions between different configurations (usually #ifs/#elses) is very subtle. This means the rate of mistake-on-first-try is, I think, higher in libc++ than in most other LLVM projects. Even with careful review, I find that a large percentage of changes end up breaking something somewhere, and I have to fix it (usually quickly enough to avoid reverting).

As a result, the lack of pre-commit testing is actively harming the health of libc++ as a project. It might be true for other projects as well, but I can only speak for libc++ because that’s where I have first hand experience. Unfortunately, we currently don’t have a good way of doing pre-commit testing on Phabricator AFAICT. From the Harbormaster documentation 1:

You’ll need to write a nontrivial amount of code to get this working today. In the future, Harbormaster will become more powerful and have more builtin support for interacting with build systems.

So while I appreciate all the efforts being made in this area, I still don’t even know where to start if I want to setup pre-commit testing for libc++ today. However, the path is very clear with GitHub PRs and there are many options available.

Whenever I hear arguments of dividing the community, not being able to share infrastructure, the lack of Herald – those all make a lot of sense to me and I think they’re good arguments. However, it is clear that folks who even think about these arguments are not paying the same cost for the lack of good pre-commit testing that I’m paying on a weekly basis, because for me that outweighs everything else.

I don’t know how to come to a decision here, all I know is that libc++ needs to get out of the status quo soon. And if the solution is that Harbormaster suddenly becomes usable without an unreasonable time investment from me, then I’m fine with that too. I’m not looking to switch to GitHub PRs for the sake of it, I’m looking to solve problems that are harming libc++ in the current system.

Cheers,
Louis