code-owner sporks

Just brainstorming here, but what if each CODE_OWNER maintained a spork on Github and accepted Pull Requests? What’s a spork, you ask? Well it’s fork with no intent to diverge - it spoons some centralized repo (be it via git or git-svn). If you haven’t heard the term ‘spork’ in this context before, it’s either because I just made it up or that we share the same incapacity to google effectively.

As a contributor, my process would be to fork Github’s llvm-mirror and make my patch locally. Then I’d crawl up the directory tree from my code changes until I found a CODE_OWNER.TXT. Worst case, I get to the root directory and spot a CODE_OWNER.TXT with a URI to the central repository. All other CODE_OWNER.TXT files would contain a git URI pointing to the code owner’s spork. I’d make a Pull Request and hope for a review from the owner and/or anyone else monitoring that spork. Once the owner accepts the Pull Request, it’d be between the members of the code-owner oligarchy how and when the patch is upstreamed to the central repository.

Thoughts?

Thanks,
Greg

Doesn't sound useful for the code owners. Barrier to entry on submitting
patches to llvm or clang is almost never the version control scheme so I
don't see what the community gains either other than more complexity to
manage.

-eric

I think the main benefit of a scheme like this would be that a pull request tells a code owner which patches require their attention. As a contributor it would be nice to see your patch in a queue somewhere rather than just be buried down the mailing list. When patches are sent to llvm-commits it can be hard to tell if a code owner has noticed the patch because it is a very high-volume list.

As a code owner I would think it would be nice to see a consolidated list of the open patches for your area. I suppose it would also be nice to see which commits have gone into your area and need a post-commit review.

– Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

Greg,

Unfortunately, I don't expect pulling would not work better as our expected.
For example, GIT_AUTHOR is not honored for 3rd committer to
git-svn-dcommit, even if GIT_AUTHORS is registered person.

I am planning to implement the pre-commit tester, like Buildbot's "Try
builder". Then changesources could be our Phab.

I think to provide a method to confirm consistency with unit tests
would help us. How do you think?

ps. It would be helpful; To mention "My proposal is also available in
{{sha1}} in github/person/repo.git" like pull-request, in a proposal.
Pulling commits from other git repos is easier to some of us.

...Takumi

I think the main benefit of a scheme like this would be that a pull request
tells a code owner which patches require their attention. As a contributor
it would be nice to see your patch in a queue somewhere rather than just be
buried down the mailing list. When patches are sent to llvm-commits it can
be hard to tell if a code owner has noticed the patch because it is a very
high-volume list.

It might seem intimidating at first, but it will blow over really
quick once you get one or two patches in and you learn how to do
incremental development. Ping at appropriate time intervals (~3 days
is sane); I think the most pings I've seen before an answer is Ping^4
("Fix cmake for Hexagon cross compilers"), and the reviewers were very
apologetic about the situation. Looking back at your submission
history, it looks like your patches have been really "meaty" patches;
by that I mean that they affect core functionality and need a careful
review by one of a small number of people who really understand that
part of the code; understandably, these patches are more likely to go
unanswered, especially if you haven't gotten a foothold in that part
of the tree.

You might try sending in some smaller, trivial patches, like
documenting a function which is missing documentation (your patches
show that you apparently know the code quite well, an advantage I
didn't have), or even just spelling fixes (don't go overboard though).
That helps grease the wheels. Look at c27faccb in llvm.git and
2bfdad11 in clang.git for my first two patches, which I caught simply
because I was self-hosting with a bleeding-edge clang with the latest
warnings turned on.

As a code owner I would think it would be nice to see a consolidated list of
the open patches for your area. I suppose it would also be nice to see which
commits have gone into your area and need a post-commit review.

A spork does nothing to solve this: most development is small,
incremental changes done by people with commit access *committing
directly to trunk* (possibly after a review if the change isn't
"obvious", or is large, or otherwise tricky). The LLVM development
style tries to make many small, *completely obvious* changes which are
committed to trunk directly, with very strong code review for less
obvious changes (generally pre-commit for more junior committers, and
post-commit for more senior committers).

It *would* be nice for a code-owner to have some way to see what needs
to be reviewed, but a git mirror on github is not going to do that for
them in any way.

-- Sean Silva

As a contributor, my process would be to fork Github's llvm-mirror and make
my patch locally. Then I'd crawl up the directory tree from my code changes
until I found a CODE_OWNER.TXT. Worst case, I get to the root directory and
spot a CODE_OWNER.TXT with a URI to the central repository. All other
CODE_OWNER.TXT files would contain a git URI pointing to the code owner's
spork. I'd make a Pull Request and hope for a review from the owner and/or
anyone else monitoring that spork. Once the owner accepts the Pull Request,
it'd be between the members of the code-owner oligarchy how and when the
patch is upstreamed to the central repository.

I'm not sure what drives the desire to propose these kinds of things,
but I also had a similar feeling at some point before I started
contributing patches. I think the psychology of it boils down to:

"LLVM is really cool and I want to contribute"
+ "but I haven't really contributed anything, even though I've been
lurking for a while"
=> "there must be some barrier to entry"

coupled with:
"git is awesome, github is t3h k00l, SVN sucks amirite"
+ "LLVM uses SVN for trunk"
=> "of course, the barrier to entry which is blocking me *must* be
that LLVM is using svn and not git!"

leading to:
"I should get LLVM to use git, so that I can get unblocked and start
contributing".

At least, I think for me it was roughly like that. My memory is, as
usual, prone to error; I also question my (nonexistent) qualifications
for psychological analysis.

Regardless, I can tell you with certainty: the barrier to entry is not
that LLVM isn't using git.

There is one and only one way to get started contributing patches: get
patches reviewed and committed. It's a bit of a strategy game at
first. You *have to optimize for getting the code committed*, implying
*reviewed*. Start small (fix documentation, compiler warnings,
spelling), since these changes are easiest to review and get
committed. Don't underestimate the difference between "random patch
from some guy" and "patch from that guy for that part of the tree he
was looking at"---the difference is enormous, even if "that part of
the tree he was looking at" was fixing a spelling mistake *which he
sent in a patch for and got committed*. There's a strong social effect
to being "that guy" instead of "some guy", but also, getting a
spelling fix committed demonstrates that your patch is above a certain
very low bar (but a bar nonetheless) of "not going to be annoying to
review and commit" (e.g., not having to figure out that I need to pass
something weird like `-p3` to git-apply in order to apply this patch,
or be `cd`'d into a particular directory (git-format-patch avoids all
these problems, btw)).

As you get a foothold, you can grow larger. There are sometimes
relatively mechanical changes that need to be made, but which are kind
of tedious and so they go undone: these are good candidates for your
"training wheels" phase (last paragraph was "tricycle") of more
substantial patches which will teach you how to do incremental
development (e.g., learning that "well while I'm here, I'll fix this
trivial whitespace issue" is evil); if you need ideas, ping me.

I promise you that once you have gotten a couple patches committed
like that you will read this message you just sent and have a "what
was I thinking" moment.

-- Sean Silva

Sean Silva wrote:

I think the most pings I've seen before an answer is Ping^4
("Fix cmake for Hexagon cross compilers"), and the reviewers were very

And now I have another 6 patches at ping^6
[1/6] Hexagon TC: Update toolchain to add appropriate include paths

Sebastian

"David Peixotto" <dpeixott@codeaurora.org> writes:

I think the main benefit of a scheme like this would be that a pull
request tells a code owner which patches require their attention. As a
contributor it would be nice to see your patch in a queue somewhere
rather than just be buried down the mailing list. When patches are
sent to llvm-commits it can be hard to tell if a code owner has
noticed the patch because it is a very high-volume list.

This is right. The highest barrier to entry for patches is that they
are not seen.

                             -David

Sean Silva <silvas@purdue.edu> writes:

I think the main benefit of a scheme like this would be that a pull request
tells a code owner which patches require their attention. As a contributor
it would be nice to see your patch in a queue somewhere rather than just be
buried down the mailing list. When patches are sent to llvm-commits it can
be hard to tell if a code owner has noticed the patch because it is a very
high-volume list.

It might seem intimidating at first, but it will blow over really
quick once you get one or two patches in and you learn how to do
incremental development.

This is simply not true, IME.

Ping at appropriate time intervals (~3 days is sane);

3 days is *not* sane. The fact that we require pings at all indicates a
broken process. I don't understand why there is such resistance to a
patch queue. I don't even care about the revision control system as
much as I do about a managed patch system. We have a system for bugs,
which are _much_ less frequent occurrences than patches.

I can't stall my work for 3 days waiting for someone not to see my
patch.

I think the most pings I've seen before an answer is Ping^4 ("Fix
cmake for Hexagon cross compilers"), and the reviewers were very
apologetic about the situation.

The reviewers are *always* very apologetic. That doesn't help the
developer, unfortunately.

Looking back at your submission history, it looks like your
patches have been really "meaty" patches; by that I mean that they
affect core functionality and need a careful review by one of a small
number of people who really understand that part of the code;
understandably, these patches are more likely to go unanswered,
especially if you haven't gotten a foothold in that part of the tree.

Why is that at alkl understandable? It's understandable that it might
take a while top review them but to not get an acknowledgement at all?
That is never understandable.

It *would* be nice for a code-owner to have some way to see what needs
to be reviewed, but a git mirror on github is not going to do that for
them in any way.

Ok, so lets do it some other way.

                           -David

This is opinion. I see everyone's patches, but my review time is precious
and the patch may not hit the criteria of what I'm willing to spend my time
on.

-eric

David, I can understand your position here, but I don't see *you* reviewing any patches.

-Chris

Chris Lattner <clattner@apple.com> writes:

I can't stall my work for 3 days waiting for someone not to see my
patch.

David, I can understand your position here, but I don't see *you*
reviewing any patches.

I have provided reviews in the past. Not a lot, but then most of the
development is in areas I don't touch a lot.

In any case, an _ad_hominem_ is not really a valid argument here.

                          -David

Eric Christopher <echristo@gmail.com> writes:

    This is right. The highest barrier to entry for patches is that
    they
    are not seen.
    
    

This is opinion. I see everyone's patches, but my review time is
precious and the patch may not hit the criteria of what I'm willing to
spend my time on.

That's valid, but code owners should at least acknowledge the patch and
say, "I don't have time to review this" with some estimate of how long
it might take before they get around to it. I say "code owners" because
we shouldn't require everyone on the list to acknowledge a patch they
won't review. :slight_smile:

As a contributor, my time is precious too. I would like to have some
way of knowing that *someone* is looking at the patch, that it hasn't
just dropped on the floor. A patch queue would at least allow me to
track progress and not have to save a bunch of e-mails of patches I
submitted and need to ping four days from now.

Really, patches get dropped *all the time* to the point where pings are
a regular part of the development process. That's a huge waste of time
for everyone.

                            -David

I don't see any resistance, quite the contrary. From all the list
traffic it is quite clear that a review system is currently being
tested...

Joerg

Really, patches get dropped *all the time* to the point where pings are
a regular part of the development process. That's a huge waste of time
for everyone.

It's only a waste of time if your workflow is entirely synchronous
with patch review. Most of us have a number of things that we can work
on, so letting a patch chill for a while on the list isn't a big deal
because we just go on and do something else.

Also, this situation has piqued my interest and so I am researching
the issues involved. Could you give me some links (or thread titles)
for dropped patches, so that I can take a look at them and try to
puzzle out how the development process ended up in them being dropped?
Nothing big, maybe 5 or so is probably enough. Thanks.

-- Sean Silva

Sean Silva <silvas@purdue.edu> writes:

Really, patches get dropped *all the time* to the point where pings are
a regular part of the development process. That's a huge waste of time
for everyone.

It's only a waste of time if your workflow is entirely synchronous
with patch review. Most of us have a number of things that we can work
on, so letting a patch chill for a while on the list isn't a big deal
because we just go on and do something else.

Of course, that's easier with git. :slight_smile:

Also, this situation has piqued my interest and so I am researching
the issues involved. Could you give me some links (or thread titles)
for dropped patches, so that I can take a look at them and try to
puzzle out how the development process ended up in them being dropped?
Nothing big, maybe 5 or so is probably enough. Thanks.

Really? Ok...

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121105/155173.html
Ping Request (couldn't find an actual review)

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121105/155174.html
Ping Request (couldn't find an actual review)

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121105/155233.html
Never reviewed?

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121105/155267.html
Never reviewed?

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121105/155318.html
Ping Request (looks like it got reviewed shortly after the ping)

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121105/155389.html
Reviewed five days later (I think).

The mailing list interface makes it difficult to look for the things you
requested, but all of these examples occurred just last week.

                                    -David

Joerg Sonnenberger <joerg@britannica.bec.de> writes:

<dag@cray.com> writes:

Sean Silva <silvas@purdue.edu> writes:

Really, patches get dropped *all the time* to the point where pings are
a regular part of the development process. That's a huge waste of time
for everyone.

It's only a waste of time if your workflow is entirely synchronous
with patch review. Most of us have a number of things that we can work
on, so letting a patch chill for a while on the list isn't a big deal
because we just go on and do something else.

Of course, that's easier with git. :slight_smile:

Just to follow up, I agree that people shouldn't be stuck for days
waiting for their patch to be reviewed. The cost is more subtle than
that.

- I have to *remember* I submitted the patch (not hard, but it is a
  cost).

- I have to save that e-mail from llvm-commits so I can refer to it when
  the inevitable ping is necessary.

- I have to wade through tons and tons of commit e-mails searching for a
  response to my patch (for some reason the mailing list software often
  breaks threading). This is a more general problem with the current
  review process, not strictly a timeliness issue.

- I have to send a ping e-mail.

- Now I have to look for responses to *two* e-mails.

In short, e-mail is a very poor vehicle for managing this kind of
process. I hope that the patch queue in testing helps alleviate the
poor response time problem. Even just an acknowlegement, "hey I got
your patch but it will be a few days before I can review it", would help
a lot.

                           -David

And now I have another 6 patches at ping^6
[1/6] Hexagon TC: Update toolchain to add appropriate include paths

:frowning: I'm really sorry. To be honest, looking at those patches, I'm not
entirely surprised though. Consider the following a "meta"-review:

First off, is chandlerc *really* the only person who can review these
changes? It seems unwise to bottleneck a patch on one of our busiest
developers. git-blame, or straight-up ask on IRC who else knows about
that part of the code.

Second, those patches are not optimized for review: they are basically
code dumps. You know your reviewer's time is limited, but the commit
message in your patch is a oneliner? Seriously? Also, there is no
context for *why* those changes are necessary in the body of the
message; phrases like "this patch is a step towards our main goal of
..." really help the reviewer get oriented and speeds the review,
along with letting other list-readers know what's up and maybe get
them pitch in to the review (maybe for just style review, but that
helps). Also, it may help to try to complete the phrase "this patch is
completely obvious except for X", and then specifically ask the
reviewer to look at X so they don't waste their time wading through
obvious stuff to find the stuff that needs to be reviewed. Better yet,
directly commit the obvious changes. Another thing to take advantage
of is that usually when people start reading text, they continue to
read it, so if you have a nice, well-written description of the patch,
and that by the end the reviewer knows what to expect from the review,
what they need to be looking for, etc., then when they finish it is
likely that they will simply open the patch, look for the things you
asked, and give you the LGTM. I could go on, but I think I have
already communicated the main idea: optimize for code review.

Third, my impression is that if you want a patch set to be treated as
a whole, attach the patches to the same message. On the LKML people's
mail setups allows for the [PATCH I/N] to work really well, but my
impression is that the mail setup of most people here doesn't work
well with it.

Fourth, you need to be more adaptive about the whole situation. It
seems like the people pushing those patches basically are running a
mechanical loop:

while (!Patch.isCommitted()) {
  sleep(several days);
  Patch.ping();
}

You aren't a machine! Clearly what you are doing is not working, and
as a human you should realize that and correct it! After 2 or 3 pings,
I would express an understanding of the reviewer's busy-ness and
explicitly ask the reviewer if they can suggest someone else to do the
review, so that the patch can be committed in a timely fashion. If
that doesn't work, then you will probably want to reach out on IRC.
Ultimately, the highest level of escalation I can see is going to be a
thread to llvmdev discussing your situation and the systemic problem
you are running into (not a one-liner "hey, wtf guys we have a bunch
of patches that aren't getting reviewed", but a real, thought-out
discussion of the situation, indicating the forces at play as you see
it, the nature of the patches, their importance, what you have tried,
and possible directions for the community to address the situation)
and asking the community for advice: such a thread will not go
unanswered and will is practically guaranteed to lead to the issues
being swiftly addressed.

Sorry for the lengthy message. Hopefully it will help get that patch
set committed :slight_smile:

-- Sean Silva

- I have to *remember* I submitted the patch (not hard, but it is a
  cost).

If you forgot, the chances are high that the patch was unimportant. I
do my development on local git branches, so every time I do `git
branch`, I'm reminded. There's really no overhead.

- I have to save that e-mail from llvm-commits so I can refer to it when
  the inevitable ping is necessary.

Maybe you need a better mail-reader? In gmail I just look in my "sent
mail" or if I have sent a lot of mail recently I search "is:sent
has:attachment". It literally takes 10 seconds.

- I have to wade through tons and tons of commit e-mails searching for a
  response to my patch (for some reason the mailing list software often
  breaks threading). This is a more general problem with the current
  review process, not strictly a timeliness issue.

Sounds like you need a better mail reader. I have never heard of this
even remotely being a problem in any way, at all, ever.

- I have to send a ping e-mail.

Once again, this takes like what, 10 seconds? I find it highly
unlikely that 10 seconds is a non-negligible amount of time compared
to the time spent coding, building, and testing any non-obvious (i.e.
needing review) change.

- Now I have to look for responses to *two* e-mails.

You definitely need a better mail reader. In every mail reader I've
ever seen, both mails get clustered into a thread. Maybe this is the
root cause of your vehement dissatisfaction with the current
email-based system?

In short, e-mail is a very poor vehicle for managing this kind of
process. I hope that the patch queue in testing helps alleviate the
poor response time problem. Even just an acknowlegement, "hey I got
your patch but it will be a few days before I can review it", would help
a lot.

Have you considered the amount of quality code that gets produced with
the current process? What kind of improvements can we expect from such
a system? Are there advantages that email has over a more structured
sort of review system like Phabricator? If so, does the current
development process take advantage of them (and hence would be
*negatively* impacted by the switch)? Are you sure that the bottleneck
which causes high review latency isn't elsewhere in the system? Or, as
counterintuitive as it may seem, perhaps all of the things you are
pointing to as problematic (e.g. review latency, patches getting
dropped) are actually *beneficial* (maybe it prevents less-determined
people from becoming contributors? idk, I'm not willing to rule it out
without more careful analysis).

-- Sean Silva