Phabricator -> GitHub PRs?

"Doerfert, Johannes" <jdoerfert@anl.gov> writes:

I still find Phab to be inscrutable. I don't use any of its advanced
features. I'm a long-time contributor.

I asked a similar question in this thread in the very beginning: What
actual problems do you have with Phab? There might be usable solutions
out there already. The last time someone actually listed problems we got
a lot of good responses, some of which I will try out myself.

I just posted a few I've run into over the years.

I can't imagine how difficult it is for folks new to the project.

I am always in favor of improving the documentation. We need more
concrete problem descriptions though.

Hopefully my post will help.

For all of GitHub's many flaws, its very strong advantage is that it is
a de facto standard. People understand it.

I do not. Arguably because I have not yet used it. However, "it is a de
facto standard" is a weird argument for anything. People are advocating
to move away from mailing lists towards other system though mailing
lists are, or at least were, "de facto standard". Is the idea to keep up
with the "de facto standard" or to improve the status quo (for group X*)?

* Substitute X as you see fit.

I guess I see the goal being integrating new contributors as easy as
possible. Now I know that many people argue that keeping existing
contributors happy is more important and I can appreciate that
viewpoint. It is a difficult balance to strike. I am trying to think
about solutions that lower the barrier to new contributors while letting
existing contributors still get work done, though with workflow changes.

If moving away from mailing lists makes it significantly easier for new
contributors then I will very seriously consider that. I like e-mail
for a lot of reasons and would be sad to see it go but I'm willing to
make that sacrifice if needed to keep the project strong.

I feel like we are also discounting the possibility that GitHub can
improve. We have evidence that they will respond to requests for
features.

IME every project eventually dies because it is no longer useful or
because it becomes too difficult to attract/integrate new contributors.
gcc almost died in the '90's and it was only because the issue was
forced via a fork that the project finally opened up to new
contributors. I have seen projects in the private sector die because it
was too much work to teach new employees about them. So to me,
attracting new contributors to the LLVM project is vitally important.

                       -David

Hi Renato,

I really did try to be constructive in these discussions. If my email
was conceived otherwise I'm sorry about that.

> > I still find Phab to be inscrutable. I don't use any of its advanced
> > features. I'm a long-time contributor.
>
> I asked a similar question in this thread in the very beginning: What
> actual problems do you have with Phab? There might be usable solutions
> out there already. The last time someone actually listed problems we got
> a lot of good responses, some of which I will try out myself.

This thread has fallen down to the following pattern:

1. I tell you what I don't like / can't stand
2. You tell me that's not a problem for you and why
3. You ask me to counter your argument

I think this is an oversimplification to paint things black and white.
People did offer solutions (tips, workflows, scripts, ...) as a response
to actual problems (on both sides of the argument).

This is not a helpful way to conduct a fact checking exercise.

I respect the opinion of both sides, and I know some people have
gotten to like Phab and others to hate.

I ask that people refrain from attacking others for not engaging in
tit-for-tat "my fact is better than yours" discussion.

Phab is good for some things, Github is good for others. People are
allowed to like either.

I'd say that helping people to improve their environment is better than
forcing others to worsen theirs. Phab is, by many accounts, more
feature-rich, thus we might be able to actually work around the problems
people have reasonably if these are articulated.

I agree that there was communication of the kind "Phab is really bad, GH
are better because XXXX" with responses listing flaws in GH or saying
XXXX is not a problem. As I mentioned above, asking people to list
problems does actually result in solutions being offered, why is this
not good?

> I am always in favor of improving the documentation. We need more
> concrete problem descriptions though.

More documentation or tooling won't fix the fact that much more people
know about GitHub PR than Phab.

It's the same reason why we moved to monorepo GitHub, because everyone
had their own tooling to handle multi-repo Git-SVN hybrid.

If we change the process to be more in tune with GitHub, then their PR
system will (obviously) be far more suitable.

I don't disagree but I ask if that is what we want. If the answer is
"maybe" we should make that the discussion. So far, we are mainly
discussion replacing Phab with PRs and keeping the system otherwise
intact.

What I'm asking is that we review both together. Current process with
Phab versus a GitHub process with GitHub PR.

But for that we need to actually list the problems and benefits anyway.
Why not start with doing that.

> > For all of GitHub's many flaws, its very strong advantage is that it is
> > a de facto standard. People understand it.
>
> I do not. Arguably because I have not yet used it.

He said "most people". He is right, even if you don't, personally. Git
PR (GitHub, GitLab, Gerrit) is indeed the de facto standard.

Here and below I actually did not try question the statements made but
I try to determine what they mean for us.

> However, "it is a de facto standard" is a weird argument for anything. People are advocating
> to move away from mailing lists towards other system though mailing
> lists are, or at least were, "de facto standard". Is the idea to keep up
> with the "de facto standard" or to improve the status quo (for group X*)?

That's the very definition of "de facto". The vast majority of people
use Git, and of those, GitHub/GitLab, and of those, Git PRs.

Phab is niche compared to GitHub. It doesn't make it worse, but that is a fact.

I did not argue anything else. I asked if we are trying to keep up with
the de facto standard for the sake of it, to improve the situation for a
certain group, or (implicitly) if the move would be generally speaking good.

Cheers,
  Johannes

"Doerfert, Johannes via llvm-dev" <llvm-dev@lists.llvm.org> writes:

Nowadays we have tools to manage the stacks for us [1][2], and that as a
plus don't require arc/php to be installed on your system.

I don't do much LLVM / clang stuff, but submitting stuff with [1] just works
(with `moz-phab HEAD~N --no-bug`), and it should submit your last N commits
separately automatically, without having to submit them one-by one and
linking them via the web interface / annotate stuff in the commit message.

Sorry if this is just noise, though maybe it helps.

This looks pretty cool, thanks! I'll for sure give it a try!

Agreed, this is interesting as is git-phab.

I would like to take a step back and talk about existing use-cases in
Phab. People have talked a lot about linking revisions, patch series,
parent/child relationships and so on and I have to confess I am
struggling to understand the use-cases for these things. Most of what I
do upstream is simple enough to accomplish with individual patch
submissions and reviews. In some cases I have posted "mega-patches" for
context purposes but I'll admit that isn't a very good workflow as
things quickly get out of date.

I would like to understand better how people use Phab's advanced
features and why. For example, what drives the need for patch series
and dependence relationships? Some walk-through examples would be very
helpful.

                         -David

Note the difference: One side is trying to *help improve", while the
other is *forcing to worsen*.

This is really not helpful.

--renato

That is not what I am saying, or at least you seem to interpret it
differently than I indented it to be read. In the part you cropped I
mention that *both sides* provide *helpful advice* to improve the setup
of people. I argue the problems need to be clarified for that. *Forcing
to worsen* happens when we say we *move* but also when we *do not move*.
Emails that just say we should move or not are therefore problematic.

Cheers,
  Johannes

At least at Mozilla, it's good practice to split a given patch in
logical, reviewable pieces that are associated to the same bug, and that
the same or different people may need to review.

That generally makes the patches get reviewed much sooner. During
review, some of the initial parts may be good to land, while some others
may need changes.

[1] or [2] are recentish examples that come to mind, but it happens
fairly often. Of course for a bunch of simpler changes one revision is
enough.

The use cases are similar to the "I have one PR with multiple commits"
in GitHub, but with the advantage of being able to review them
individually, and thus they can land upstream sooner.

-- Emilio

This has simply not been true in my experience. Actually, not having
to re-send a new series is one of the main advantages that
Phabricator-based review has over the original review style for Git,
which is to send patch series via mailing lists.

It might be the case that you occasionally have series where major
redesigns are required, and then asking for a fresh start makes sense.

Cheers,
Nicolai

Hi David,

I would like to understand better how people use Phab's advanced
features and why. For example, what drives the need for patch series
and dependence relationships? Some walk-through examples would be very
helpful.

Here's a somewhat more complex example of changes made by myself a
year and a half ago, starting with ⚙ D48013 TableGen/SearchableTables: Support more generic enums and tables

It's a series of changes across TableGen and the AMDGPU backend to
refactor how we represent a class of instructions. Some patterns that
occur:
- an NFC (no functional change) refactoring/rename change in an area
that is then later touched by a functional change.
- a change to common infrastructure (TableGen) that is motivated by
and used by a subsequent functional change in the AMDGPU backend, but
which can stand on its own and which may want to be reviewed by
different people
- a cleanup change to remove deprecated functionality from the backend
once the refactored functionality is available, separated out in order
to smoothen the upgrade path for frontends that are maintained outside
of the llvm-project repository

Reviewing all of this in a single amorphous diff is something that I
wouldn't wish on anybody. Conversely, having the linkage between
different commits provides context to reviewers.

It is also helpful to me: I can keep track of reviews to the series
while simultaneously working on other changes that happen to be
unrelated, and having the commits in separate stacks helps by
implicitly grouping them. Admittedly this advantage is comparatively
minor because the UI isn't perfect.

Cheers,
Nicolai

There is an (ever growing) patch series that connects contributions by
multiple people with patches in all sorts of states, from merged to WIP:

The first was this one https://reviews.llvm.org/D69785, since then the
series grew in all directions (see the stack).

I have other (=smaller) patch series that evolve over time but this one
is the biggest and most complex.

Emilio Cobos Álvarez <emilio@crisal.io> writes:

[1] or [2] are recentish examples that come to mind, but it happens
fairly often. Of course for a bunch of simpler changes one revision is
enough.

I think you forgot to include links. :slight_smile:

The use cases are similar to the "I have one PR with multiple commits"
in GitHub, but with the advantage of being able to review them
individually, and thus they can land upstream sooner.

Downstream we strongly suggest people use a one-commit-per-PR model.
That does incur some additional overhead but I personally haven't found
it to be an issue. Usually if I have a series of dependent patches,
then I need to merge the first one before I can do the second one, etc.

I can see how having multiple patches up at once for review might speed
things up. But what if a very first patch in a series needs major
changes and that affects every single following patch? The series has
to be re-posted anyway, right? I don't see how this is much better than
reviewing the first patch and getting that merged first, moving on to
the second patch, etc.

If the patches truly are independent, then why not open a separate PR
for each one? Yes, with GitHub that requires a separate branch for each
but if the changes are truly independent then multiple branches seems
entirely appropriate to me. IME it helps keep things organized.
Independent changes are independent lines of development. Changes
needed for one don't affect the state of the others.

If the patches are dependent, then we're back to the situation above
where review causes the need to re-post a new version of the series. If
review goes smoothly then I guess maybe there is some time saved but how
often does that really happen for any kind of large change that would be
needed to be broken up into multiple commits?

                       -David

Joerg Sonnenberger via llvm-dev <llvm-dev@lists.llvm.org> writes:

>> GitHub PR is the "de facto standard", everyone knows, the entry cost
>> is practically zero. The UI is lean and missing features, but the
>> large availability of tooling (either targeting GitHub directly or
>> plain git) makes up for a lot of it.
>
> Just like with the "Everyone knows git", I call bullshit on this. Sorry,
> but the far majority of GitHub users know little more than the most barebone
> functionality of Pull Requests and the typical use case in most projects
> is to squash all changes. But at this point it seems clear that the real
> goal is to just move everything to GitHub just for the sake of it.

Let's try to assume good faith on the part of all parties.

That is a very good point, thanks :slight_smile:

>> Phab has a very complex UI with a lot of features that people have
>> come to rely over the years. But it's far too complex for new people
>> and requires specially crafted tooling to work with.
>
> Can you at least try to make reasonable arguments without ridiculous
> hyperbole? Using phabricator for the casual user requires no special
> tooling. Patches can be easily submitted via patch upload. There are
> better ways to integrate it into a workflow, arcanist and git-phab just
> to mention two.

I for one cannot get arcanist to work because of its reliance on PHP
which for whatever reason refuses to install on my machine. It it's
hard for me to do a simple install of a tool for code review, I can't
imagine the frustration of new people trying to get everything working.

Mozilla developed phab tools w/o PHP. I'll actually going to give them a
try soon. (Came up in this thread somewhere).

I wasn't aware of git-phab and don't recall seeing it mentioned in the
LLVM documentation. I just checked and there is no mention of it. I'm
not sure how new people are supposed to know about it. It does look
nice from a cursory review of its front page. I'll give it a try!

We need to add more documentation, true.
We also need to openly discuss problems people face and look for ways to
remedy them.

I have never used Phab's more complex features. There's nothing in the
LLVM documentation that talks about patch series or patch parents or
anything more advanced than "submit a patch and pick reviewers." For
many people that is enough. But if I were to run into a situation where
one of the reviewers asked to link my change to some other change, o
present things as a patch series I would have no idea how to do that.
If I were new to the project I would just give up at that point.

I would really hope that people would ask the reviewer how to do that if
the reviewer explicitly requests it. This should not only be true for
Phab related questions but also for questions on code comments,
etiquette, process ...

If people do not ask questions, that is a different problem.
If questions are not answered, that would be the worst.

There's this bit in the documentation:

  Phabricator has many useful features, for example allowing you to
  select diffs between different versions of the patch as it was
  reviewed in the Revision Update History. Most features are self
  descriptive - explore, and if you have a question, drop by on LLVM Project in
  IRC to get help.

As has been noted in other threads, new users find IRC intimidating. I
haven't used IRC since I don't know when. I'm not even sure I *can*
given company policy. It's another hurdle to getting things done. And
important features are not "self descriptive" (see below).

The Phab instance has very little documentation and what there is is
difficult to find (I have to go to "More Applications" and click
"Guides" under "Utilities" -- really, does this organization make any
sense at all?). And then the guides seem limited to configuration, not
actual use-cases.

"More Applications" looks very intimidating with nonsensical names for
things I might want to do ("Harbormaster?" "Drydock?").

GitHub uses Markdown, which is another de facto standard. Phab uses
"Remarkup" which I guess is similar but different.

The front-page of Phab makes little sense to a new user. What is
"Differential?" What is "Diffusion?" Do they both have something to do
with diffs? Oh wait, "Diffusion" brings me to a list of "Active
Repositories." What?

Ok, let's try "Differential." Looks promising. There's a list of "LLVM
Needs Review." Looks like a code review tool. But I just want to
create a patch and submit it. Grumble, grumble, look back at the LLVM
documentation. Oh, it's the tiny inconspicuous button on the very top
right corner "+ Create Diff." The documentation told me about the
button but not where it's located. Took me some time to find it.

The above journey to get a patch submitted was my actual first
experience with Phab. It did not leave a good taste in my mouth.

What if we take these points and act on them?
It should be "fairly simple" to simplify the interface by hiding less
often used apps.
While writing documentation is not easy, it would also help I guess.

Emilio Cobos Álvarez <emilio@crisal.io> writes:

[1] or [2] are recentish examples that come to mind, but it happens
fairly often. Of course for a bunch of simpler changes one revision is
enough.

I think you forgot to include links. :slight_smile:

Whoopsies :slight_smile:

* 1449861 - Add a UTF-8 WebIDL string type
* https://bugzilla.mozilla.org/show_bug.cgi?id=1503656

The use cases are similar to the "I have one PR with multiple commits"
in GitHub, but with the advantage of being able to review them
individually, and thus they can land upstream sooner.

Downstream we strongly suggest people use a one-commit-per-PR model.
That does incur some additional overhead but I personally haven't found
it to be an issue. Usually if I have a series of dependent patches,
then I need to merge the first one before I can do the second one, etc.

I can see how having multiple patches up at once for review might speed
things up. But what if a very first patch in a series needs major
changes and that affects every single following patch? The series has
to be re-posted anyway, right? I don't see how this is much better than
reviewing the first patch and getting that merged first, moving on to
the second patch, etc.

If the patches truly are independent, then why not open a separate PR
for each one? Yes, with GitHub that requires a separate branch for each
but if the changes are truly independent then multiple branches seems
entirely appropriate to me. IME it helps keep things organized.
Independent changes are independent lines of development. Changes
needed for one don't affect the state of the others.

If the patches are dependent, then we're back to the situation above
where review causes the need to re-post a new version of the series. If
review goes smoothly then I guess maybe there is some time saved but how
often does that really happen for any kind of large change that would be
needed to be broken up into multiple commits?

I don't think you necessarily need to post a second version of each
patch in the series. At least I update changes individually. Some of the
changes on the bottom of the stack may require changes on the following
patches of course, but...

-- Emilio

Emilio Cobos Álvarez <emilio@crisal.io> writes:

[1] or [2] are recentish examples that come to mind, but it happens
fairly often. Of course for a bunch of simpler changes one revision is
enough.

I think you forgot to include links. :slight_smile:

The use cases are similar to the “I have one PR with multiple commits”
in GitHub, but with the advantage of being able to review them
individually, and thus they can land upstream sooner.

Downstream we strongly suggest people use a one-commit-per-PR model.
That does incur some additional overhead but I personally haven’t found
it to be an issue. Usually if I have a series of dependent patches,
then I need to merge the first one before I can do the second one, etc.

I can see how having multiple patches up at once for review might speed
things up. But what if a very first patch in a series needs major
changes and that affects every single following patch? The series has
to be re-posted anyway, right?

Even if it is being re-posted anyway, it does not mean the reviews for the latter patches “restart at zero”. An “major” interface change flowing from the first patch might lead only to mechanical changes that would be considered NFC to “status quo” of the later patches in the series. A force push with GitHub would harm such a workflow, but Phabricator handles such use cases well.

I fully agree that installing PHP shouldn't really be a requirement.
Mercurial for example has the phabricator extension as standalone
Python code. It essentially provides two commands, phabsend and
phabread, to get changes from/to the Phabricitor server. It doesn't
cover the "discussion" part of a review, but that looks reasonable on
the web UI to me. I don't know what the status of git tooling is here. I
think git-phab still requires arcanist to be installed.

Joerg

One typical case for a patch series is if you need infrastructure in a
number of places in place first. Sending all changes at once allow
others to see where you are going, independent of whether the individual
pieces are acceptable immediately or not. Yes, this can mean later
patches need to be reworked for larger structural changes, but that
doesn't necessarily invalidate review remarks already made.

The other case is reworking a pass to handle a couple of similar, but not
identical cases. It might not be possible to take out patch 2 in a
series in this case, but most often the review changes here are smaller
"stylistic" issues without huge impact on later steps.

Joerg

So far I have seen good and bad arguments on both sides, but you keep
saying no one is providing good arguments on the other side.

It seems to me that the arguments against Phab are not a problem for
you, or you have a solution that works for you, and are therefore, not
good enough as an argument (or you need better ones).

What I'm saying is that the solutions you have may not be relevant or
good enough to them, and don't solve their problem.

In the end, both GitHub and Phab have benefits and problems, and this
will be more a choice of opinion than technical.

Trying to invalidate opinions just because they're not a big deal for
you is not helpful.

I spent many years doing code review on emails, then even more on
Phab, GitHub, Gerrit. My personal opinion is that GitHub is the least
bad in user experience, but also the simplest one, and not in the best
way. But the best part for me is that I can do almost everything from
the command line, using just plain git. That improves my productivity
immensely.

Phab is a major pain for me because I have trouble remembering all the
bells and whistles. I tried getting Arcanist to work, then it stops
working, then I have to do it all over again. The UI is super
counter-intuitive to me and even having used it for many years, it's
still alien. I make many mistakes, random ones, and that makes me
spend less time reviewing LLVM patches, not more.

It may be a problem just for me, maybe it's to do with how my brain is
wired, which granted, is a minority. That's why I have accepted it as
a fact of life to use Phab. But if there are people out there that
feel like me, then well, I'll support.

Regarding actual issues, we had enough already on both sides, I agree
with most of them, and I have already stated some myself. Tooling is
one of them, so proposing a tooling fix won't help me. i have to work
on multiple different environments, from Arch Linux, to Ubuntu, WSL
and Windows and no single tool, other than Git, can support the same
framework across the board (or at least it would be a massive job).

So, my vote is for *any* review process that I can do via Git, with
minimal use of web interfaces or specialised tools.

But I'm happy with whatever makes most people's lives easier. If
that's Phab, phab. I'll keep scratching my nails on the blackboard.

cheers,
--renato

This has simply not been true in my experience. Actually, not having
to re-send a new series is one of the main advantages that
Phabricator-based review has over the original review style for Git,
which is to send patch series via mailing lists.

Interesting. If they can be committed separately, why were they a
series in the first place?

Or was that independent, but related, patches? In this case, they
could have been different PRs with mention of each other, no?

It might be the case that you occasionally have series where major
redesigns are required, and then asking for a fresh start makes sense.

What I mean by patch series is exactly what you would have in a
sanitised git branch: a number of sequential patches that build on
each other. You cannot commit patch 3 before 2 as Git wouldn't let you
merge.

In those cases, if we want to have patch 3 before the series, the
author needs to rewrite history (rebase -i), push the patch up and
pull into another branch, so that we can have the old tree with -1
patches as the derived series, and the cherry-picked patch merged
sooner.

But in those cases, the patch would most certainly have to be
different, and that would also change the remaining series. So, new
patch, new series.

Makes sense?

--renato

I don’t think creating a new thread (or series of N threads) because an early patch in a series necessitate some refactoring of later patches is ideal - while patch series can/should be avoided where reasonable (& if the series is small enough/the framing/design is obvious enough, you can hold on to the later patches rather than reviewing them all in parallel with the volatility that comes with that approach) - but there are certainly cases I’ve seen where it’s helpful to send a dependent series together to help frame/justify the goals of the early refactoring patches. If the early refactoring requires changes - yeah, you get that done, signed off and committed (while possibly doing higher level design review of later patches that might be applicable regardless of the early refactoring - like if you decide to rename a newly created function in an early patch, that shouldn’t hurt the review going on in later patches, for instance) & update later patches with the knock-on effects. No need to start new mail threads/lose review context/etc.

(sometimes review might indicate the whole patch series direction is not preferred - in which case, yeah, maybe throwing it out and starting a new thread/set of threads is appropriate - but not always)

On Git, I normally would have done that on separate, but related,
branches and pull requests. II find it much easier to work with
rebases and cherry-picks across branches than on the same branch. A
rebase -i can turn dreadful if the same part of the code is touched by
more than one patch, and it can be almost impossible if that pattern
happens more than once.

In my use of the term, a patch series is a mandated order of directly
related commits, like you would have in a branch, if the patches are
logically separated into clear individual steps, for the clarity of
review.

Perhaps the name clash has caused more trouble than the original
discussion. If that's not what most people would have thought,
apologies for the confusion.

--renato

> That is not what I am saying, or at least you seem to interpret it
> differently than I indented it to be read. In the part you cropped I
> mention that *both sides* provide *helpful advice* to improve the setup
> of people. I argue the problems need to be clarified for that. *Forcing
> to worsen* happens when we say we *move* but also when we *do not move*.
> Emails that just say we should move or not are therefore problematic.

So far I have seen good and bad arguments on both sides, but you keep
saying no one is providing good arguments on the other side.

I did not try to say that and I do not recall what I said that could be
interpreted as such. I don't think the above contains such a claim, nor
could I find such a claim in the last email I send. If you interpret
what I wrote in such a way I can understand why you might be upset.
Please feel free to point out things I could formulate better to avoid
making this impression, either via the list or in a private mail.

It seems to me that the arguments against Phab are not a problem for
you, or you have a solution that works for you, and are therefore, not
good enough as an argument (or you need better ones).

I actually did not try to argue for Phab even though I am fairly happy
with Phab. I did questions the reason for a move and I asked for
problems to be described (by either side) so we can find solutions.
Whenever I said that something is not a problem for me I explained
myself, e.g., my setup. I also explained how I use Phab, and given that
I don't know enough about GH PRs, I did not attack it willingly.

What I'm saying is that the solutions you have may not be relevant or
good enough to them, and don't solve their problem.

Sure. What I'm saying is that we should try to offer them in a
constructive way nevertheless.