Update on GitHub pull requests

Commits are reported to the lists; I suspect we wrote a hook for that, because the email goes to different lists depending on which top-level directory is involved. I don’t think github would do that kind of thing on its own.

I expect to get PRs to send list mail, we’d need to have a listbot user of some kind, which we could then auto-subscribe to reviews. My team is using GitHub Enterprise internally, and we auto-subscribe people, but I don’t know that we can set up a mailing list that way.

Edit: Added an issue for this as requested by @tstellar

We have identified some gaps in this thread, it would be helpful if people could file issues for them so they can be tracked. There are currently issues for some of them, but not all.

This is our current list of issues, if you file a new one please add it to the Pull Request Migration project.

I’m glad we’re tracking the issue (Tom posted a link in another comment) as this is pretty important functionality for code reviewers.

Ah, thank you for that! This could potentially work out.

I think we need some of this to be documented up front (things like how to properly merge PRs when they’re ready, whether we should squash, etc) and some of it is fine to document as we learn more (how to handle GitHub’s lack of utility as a code review tool). The “we” I had in mind was the folks driving the infrastructure changes. Basically, I want to make sure we have enough documentation in place that active reviewers who are unfamiliar with the GitHub workflow can still succeed in reviewing and landing code without a fear of “doing something wrong.” We don’t want folks hung up on “I don’t know the correct way to land this” kind of problems, basically.

There’s two aspects to my question: 1) when someone makes a new PR, does that PR go to the mailing list? 2) when someone lands a PR, does that commit message go to the mailing list? FWIW, both of those are critical pieces of functionality for awareness of what’s happening in the project at the code owner level.

Ah, thank you for the clarification, that’s good!

I know that it’s very common in Clang for code reviews to take longer than a month from start to finish. I also know that we don’t want to kick the can down the road indefinitely (there are code reviews open that are years old at this point, and they sometimes get resurrected). So I’m not certain what the best way forward is for this. I suspect going to a read only mode will be disruptive for some amount of people, but I have no insight into how many folks would be impacted. The big downside of a hard deadline is: there’s no transition option for reviews, so recreating the patch in GitHub means losing all previous discussion on that patch. For long-running reviews, that could be a real problem.

This is a pretty significant change and I don’t think downstreams are used to tracking announcements on Discourse yet (I know some announcements have caught folks by surprise before, at least). Perhaps this is a good time to determine how to communicate major changes with downstreams? I think it’d be healthy for us to have explicit communication channels with major external stakeholders for these kinds of decisions given that they have far-reaching impact.

I’m glad to hear there’s nothing on fire currently and there’s a willingness to delay if needed (just the same, hopefully we don’t run into major roadblocks).

Same!

Excellent, thank you!

1 Like

If a downstream project does not contribute upstream, why would they have a say in the code review process / tooling? To me this seems to be the kind of change that is purely geared towards contributors.
(contrary to SVN->Git mono repo which had impact on the consumers, and at the time involved downstream in the discussion/planning).

2 Likes

That doesn’t help when you have a Herald rule to look at the subject or summary. For example, I subscribe to any reviews and commits that contain the word “CHERI”. Nor does it easily work for regexes; I have a case-insensitive search for paths with mips|riscv|sparc, and I doubt we want the full list of directories expanded out (on a random checkout from March 2023, find . \( -iname riscv -o -iname sparc -o -iname mips \) | wc -l gives 76 directories).

It’s also going to be a lot of churn having every single LLVM developer commit a change to that file every time they want to change what they’re subscribed to. And doesn’t help anyone wishing to subscribe to changes from the outside, which is a very real thing, some of whom have never submitted a patch.

7 Likes

I think the best option for people who want to subscribe to specific keywords or don’t want to get commit access is going to be email filters. I know this isn’t ideal, but it should be workable.

Sorry, I was unclear – I meant the downstreams who are active in the project but may otherwise miss things on Discourse. It’s not always easy for everyone to find relevant announcements on Discourse and sometimes the people responsible for responding to infrastructure changes are not the same people working daily within the community. (Think: dev ops teams handling down/upstream coordination may be different from the developers working in upstream on a regular basis.)

This is part of why I wanted to make sure that all activity on PRs (new submissions, updates/comments, commits) always make it to a mailing list, otherwise there’s no email filter you can write.

I was think that people would filter the emails they get directly from GitHub rather than filtering mails from the mailing list.

How do they get email “directly from GitHub”? Only by subscribing to the individual PRs. Do you really want everyone currently subscribed to the commits lists to subscribe themselves to every PR? I don’t know how github.com works, but in Sony’s GitHub Enterprise instance, there’s a limit of 100.

Filtering email from the list will work much better.

I do this for issues. I subscribe to all of them and have filters to select the ones I care about, but if people prefer to do this from the mailing list, we can look into getting notifications to the list.

I think a mailing list is a necessity – it’s the one low-friction place in which people can see all of the changes to a project, which is a pretty important property we’ve had since before switching to Phabricator. This can be important not only for awareness as changes happen but also because we frequently need to do historical digging on how changes came about. The cfe-commits Archives (and same for llvm, etc) are critical resources in that regard. It would be a serious blow (IMO) to lose that going forward.

GitHub’s search functionality is not sufficient for historical digging, to put it mildly.

1 Like

Have you tried it recently? A few months ago they switched over to their new search backend and I have seen way better results since.

Yes, it’s improved but still not all that usable (at least in my experience). Not to derail the thread onto another topic, but an example is:

Looking at the results, not a single one of them actually contains the string “Win32 compatibility” (it mostly catches -triple=...-win32 ... -fms-compatibility), so it’s as-if the quotes don’t actually matter… except the quotes do matter for other kinds of searches, so who knows).

You don’t need to subscribe to every PR manually. You can “watch” the repository and get notifications for the selected events (either all, or customized):

1 Like

What solutions do you propose to reach these people or even find them?

We have thousands of corporations/organizations using LLVM and it seems reasonable to expect them to pay attention to LLVM communication methods. It would be an incredible amount of work (if not impossible) to try to reach the right people at all these organizations.

It is possible to add labels to PRs, but apparently not to subscribe to PRs with a specific label? Is that a feature we could ask of Github ? @tstellar

As I wrote out my thoughts on this, two things became obvious to me: 1) my concerns are more about community communication problems in general, 2) for this specific topic, I think we’re hopefully sufficiently covered by the announcement channel + awareness of existing contributors.

I think it’s a problem that we don’t know the major players who are impacted by our decisions, but it’s not a problem we need to solve for the GitHub transition.

1 Like

I believe those who would be affected by a Phab->GH transition in code reviews would be those who already participate in LLVM development, and therefore can be expected to follow the project enough to be aware of the change. Those not actively participating in development wouldn’t be affected and so it’s not important to notify them.

3 Likes

I think this is true in general, but the counter-point that caused me to bring this up in the first places is that some organizations have folks who work on things like internal reporting tools. They may not be active in the LLVM project, but they might be responsible for writing scripts that do things like gather counts of new reviews or reviews with new comments on a per person basis for folks who are active contributors. These tools are used just infrequently enough that they may not be on the forefront of contributors’ minds, but I ultimately convinced myself that it’s pretty reasonable to hope that 1) this isn’t a common scenario and 2) contributors are aware enough of what’s going on to communicate that sort of thing internally in this case.