RFC: Switching from Bugzilla to Github Issues

We held a round-table at the llvm dev conference about what other pieces of Github infrastructure we may want to use. This thread in particular is about switching to github issue tracking. Use of other parts of Github functionality was also discussed – but that should be for other email threads.

Most of the ideas here were from other people. I believe this proposal represents the overall feeling of the folks at the round-table, in spirit if not in exact details, but nobody else has reviewed this text, so I can’t make any specific such claim as to who the “we” represents, other than myself. Just assume all the good ideas here were from others, and all the bad parts I misremembered or invented.

Background

Any chance of setting bugzilla to make it not possible to file new issues there?

Yes, that should be possible to do that without also disabling comments on existing issues.

Hi James,

Good write-up, and I'm in favour.

One other thing I remembered from the round-table was that I think we
did talk about extending e-mail CC flexibility as a future goal.
Chandler mentioned there may well even be off the shelf solutions we
could use that integrate with GitHub, but no-one spoke up saying that
should block the initial proposed activation in 2 weeks.

Cheers.

Tim.

I strongly support this. I quite prefer GitHub issue to our current bugzilla setup.

One thing comes to mind wrt workflows and how I often use bugzilla: After bisecting a regression and filing a bug with the reproducer, I try to add CCs for the author of the commit and potentially the reviewers who were involved with the commit.

In bugzilla it's fairly easy to try type people's names (which you find easily via the Phabricator review) and bugzilla will autocomplete and let you pick whoever seems to match what you typed.

In github I'd presume you need to know the github username of the ones you want to CC. For the cases where it matches the Phabricator username, this is straightforward, but I'm sure there's a significant number of users where the mapping isn't obvious.

After filing a regression bug I normally link to it from the originating review thread as well, which hopefully serves as notification to the same people as well though. (But depending on mail sorting habits and use of Phabricator, where e.g. I normally only list open reviews, it can be easy to miss a comment on a closed review).

// Martin

We held a round-table at the llvm dev conference about what other pieces of Github infrastructure we may want to use. This thread in particular is about switching to github issue tracking. Use of other parts of Github functionality was also discussed – but that should be for other email threads.

Most of the ideas here were from other people. I believe this proposal represents the overall feeling of the folks at the round-table, in spirit if not in exact details, but nobody else has reviewed this text, so I can’t make any specific such claim as to who the “we” represents, other than myself. Just assume all the good ideas here were from others, and all the bad parts I misremembered or invented.

Thanks for writing the proposal!
This is reflecting quite well the overall conclusion of the round-table on this topic I think.

Background


Our bugzilla installation is…not great. It’s been not-great for a long time now.

Last year, I argued against switching to github issues. I was somewhat optimistic that it was possible to improve our bugzilla in some incremental ways…but we haven’t. Additionally, the upstream bugzilla project was supposed to make a new release of bugzilla (“harmony”), based on bugzilla.mozilla.org’s fork, which is much nicer. I thought we would be able to upgrade to that. But there has been no such release, and not much apparent progress towards such. I can’t say with any confidence that there will ever be. I no longer believe it really makes sense to continue using bugzilla.

This year, we again discussed switching. This time, nobody really spoke up in opposition. So, this time, instead of debating whether we should switch, we discussed how we should switch. And came up with a plan to switch quickly.

GitHub issues may not be perfect, but I see other similarly-large projects using it quite successfully (e.g. rust-lang/rust) – so I believe it should be good for us, as well. Importantly, Github Issues is significantly less user-hostile than our bugzilla is, for new contributors and downstream developers who just want to tell us about bugs!

Proposal

We propose to enable Github issues for the llvm-project repository in approximately two weeks from now, and instruct everyone to start filing new issues there, rather than in bugzilla.

Some things we’d like to get in place before turning on Github’s Issue tracker:

  1. Updated documentation.
  2. An initial set of issue tags we’d like to use for triaging/categorizing issues.
  3. Maybe setup an initial issue template. Or maybe multiple templates. Or maybe not.

But more important are the things we do not want to make prerequisites for turning on Github issues:

We do not yet plan to turn off Bugzilla, and do not plan to migrate the existing issues to GitHub as a prerequisite for switching. We will thus expect that people continue using bugzilla for commenting on the existing bugs – for the moment.

We do not want to build supplementary notification systems to make github issues send additional emails that it is unable to send itself. We will only support what GitHub supports. That means:

  • You can subscribe to notification emails for activity in the entire llvm-project repository.
  • You can subscribe to notification emails on an individual issue.
  • Someone else can CC you on an individual issue to get your attention, and you will get notifications from that (unless you opt-out).
  • No emails will be sent to llvm-bugs@llvm.org for github issues.
  • There is no builtin way for users to subscribe to emails for bugs that have a given label (for example, all “clang” issues, or all x86 issues).

Further steps

After we migrate, there’s still things we want to do:

  1. Discuss and setup new and better procedures around bug triage and prioritization.

What we have been doing up until now has not been great in any case. Switching bug-trackers is a great opportunity to try to do something better. E.g., like what the rust project has done (https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#issue-triage, https://forge.rust-lang.org/release/triage-procedure.html#issue-triage).

  1. Bug migration

After the initial switchover, we do want to investigate two possibilities for migrating issues and turning off the bugzilla server. I expect which one is chosen will come down mostly to feasibility of implementation.

Possibility 1: Migrate all the existing bugs into a secondary “llvm-bugs-archive” github repository, and then turn off bugzilla. Github offers the ability to move bugs from one repository to another, and so we can use this to move bugs that are still relevant afterwards (potentially this could be done automatically upon any activity). Then, shut down bugzilla, and leave behind only a redirect script.

Are you excluding his as a prerequisite because we don’t have confidence this can be achieved very quickly?

Are you excluding his as a prerequisite because we don't have confidence this can be achieved very quickly?

Yes. And besides the necessary tooling (fortunately, there are some
scripts for this) it also relies on lots of things fully done, e.g.
the list of all tags for us to build the mapping from e.g. components
to tags, etc.

Replying to myself here - I rememeberd that when viewing a commit on github, github is actually very good at replacing any references to the original author name with their github username (to the extent that it actually is hard to find the realname of the author when viewing a commit).

So for CCing people on issues, the author/committer should be easy to discover; potential related reviewers are the only ones where discovering their github usernames might be tricky. And that's much less important.

// Martin

We do not want to build supplementary notification systems to make github
issues send additional emails that it is unable to send itself. We will only
support what GitHub supports. That means:
- You can subscribe to notification emails for activity in the entire
llvm-project repository.
- You can subscribe to notification emails on an individual issue.
- Someone else can CC you on an individual issue to get your attention, and
you will get notifications from that (unless you opt-out).

One thing comes to mind wrt workflows and how I often use bugzilla: After bisecting a regression and filing a bug with the reproducer, I try to add CCs for the author of the commit and potentially the reviewers who were involved with the commit.

In bugzilla it's fairly easy to try type people's names (which you find easily via the Phabricator review) and bugzilla will autocomplete and let you pick whoever seems to match what you typed.

In github I'd presume you need to know the github username of the ones you want to CC. For the cases where it matches the Phabricator username, this is straightforward, but I'm sure there's a significant number of users where the mapping isn't obvious.

Replying to myself here - I rememeberd that when viewing a commit on github, github is actually very good at replacing any references to the original author name with their github username (to the extent that it actually is hard to find the realname of the author when viewing a commit).

So for CCing people on issues, the author/committer should be easy to discover; potential related reviewers are the only ones where discovering their github usernames might be tricky. And that's much less important.

I disagree, this is extremely important. Many bugs are filed by people who don't know what commit is relevant, and other community members who watch the bugs list (including myself) help cc the right people. We must have a straightforward way to cc people. A list somewhere with people's names could be a fine solution, but the current autocomplete is an important part of the workflow, and a limited search is important (as I often don't remember exactly how various people's names are spelled).

-Hal

// Martin

In my experience, when you start typing @ in a GitHub comment, it brings up an autocomplete with tagging suggestions, and that autocomplete works for both usernames and real names (assuming people have their name on their GitHub account).

Thanks for writing this up! I can also confirm that I was there and it’s accurate to my memory.

We do not want to build supplementary notification systems to make github issues send additional emails that it is unable to send itself. We will only support what GitHub supports. That means:

  • You can subscribe to notification emails for activity in the entire llvm-project repository.
  • You can subscribe to notification emails on an individual issue.
  • Someone else can CC you on an individual issue to get your attention, and you will get notifications from that (unless you opt-out).
  • No emails will be sent to llvm-bugs@llvm.org for github issues.
  • There is no builtin way for users to subscribe to emails for bugs that have a given label (for example, all “clang” issues, or all x86 issues).

I wanted to say a bit to support the direction of not setting up a new whole-project notification system to replace llvm-bugs@.

LLVM as a project is much bigger than it was when llvm-bugs@ was created. These days, new developers do not generally subscribe to llvm-bugs@, and if they do, they don’t use it to triage issues. What happens in practice is that we have a subset of developers who do triage, add the component, and CC individuals who know the code in question. We don’t need a single mailing list for all bugs to replicate this process. We could instead document a process of triage, where triagers periodically run a search to find issues without tags and then apply tags and CCs as we do today. We could formalize this with a rotation, but let’s not get ahead of ourselves.

I really would love to find a way to get email notifications from a specific issue tag, but I don’t want to block opening the new tracker on that. Until we find a solution to that, we’ll have to get used to refreshing a bookmarked search for our favorite tags. I recall that this was discussed during the round table, and people generally agreed that new users being able to find bugs was more important than a good subscription system.

Possibility 1: Migrate all the existing bugs into a secondary “llvm-bugs-archive” github repository, and then turn off bugzilla. Github offers the ability to move bugs from one repository to another, and so we can use this to move bugs that are still relevant afterwards (potentially this could be done automatically upon any activity). Then, shut down bugzilla, and leave behind only a redirect script.

Possibility 2: Create the ability to import an individual bug from Bugzilla into the llvm-project repository by pressing a “migrate this bug to github” button. Then, leave bugzilla running only as a static snapshot – as static as possible while leaving the “migrate this bug to github” button operational.

In both cases, we’d want to support a redirect script to take you from the old bug ids to the migrated bug page. In both cases, we would preserve the entire archive of existing bugs, but would not import the entire set into the “llvm-project” github repository.

I guess “possibility 1” seems like the best to me. Once we get a good backup of all the data, it lets us turn off bugzilla sooner. It also seems easier since there are probably existing scripts to do this that we can reuse.

I’ll also mention the llvm.org/pr* link scheme. We should probably make those perma-links.

We do not want to build supplementary notification systems to make github issues send additional emails that it is unable to send itself. We will only support what GitHub supports. That means:
- You can subscribe to notification emails for activity in the entire llvm-project repository.

Is that only new issues? Or all activity? If it's all activity on all issues, you're effectively auto-subscribing to all issues, and really nobody would want that. Well, maybe, like, 3 people.

- You can subscribe to notification emails on an individual issue.
- Someone else can CC you on an individual issue to get your attention, and you will get notifications from that (unless you opt-out).
- No emails will be sent to mailto:llvm-bugs@llvm.org for github issues.
- There is no builtin way for users to subscribe to emails for bugs that have a given label (for example, all "clang" issues, or all x86 issues).

That last is really unfortunate. Someone only interested in (say) LLDB issues would have to subscribe to all notifications and hope that there are enough breadcrumbs in a new issue to be able to do accurate email filtering. It would be better to handle this in the bug tracker itself.
Last year Kristof Beyls and I did a BoF on bug handling, and my memory is that a nonzero number of people were willing to be auto-CC'd on particular topics but did not want to subscribe to llvm-bugs. This description of the github tracker means that would not be feasible, which is a step backwards.
I can anticipate a counter-argument which is that someone can easily search for bugs with particular tags. I claim that's not equivalent, because it requires action on the part of the person to go look for things, and that happens only when the person thinks of doing it. Computers should automate the tedious parts, like alerting the people who are interested in issues with a particular tag.

--paulr

Generally supportive here, but I see a couple of small concerns.

I think two weeks is simply too quick. Our community is huge, there’s inherently a delay with information dissemination and we want objectors to have a chance to respond. 4-8 weeks would be a much more realistic time frame.

Is that only new issues? Or all activity? If it's all activity on all issues, you're effectively auto-subscribing to all issues, and really nobody would want that. Well, maybe, like, 3 people.

Looking at docs for notification: https://help.github.com/en/github/receiving-notifications-about-activity-on-github/about-notifications#types-of-notifications

There are 4 levels:
- ignore all notifications including @mention
- notification on @mention or if participating (left a comment, owner of issue/PR)
- notification on releases + previous item
- get notification about *all* activity: every update in every issue/PR

However, there is such thing as teams: https://help.github.com/en/github/setting-up-and-managing-organizations-and-teams/about-teams

We could create teams for front-end, back-ends, static analyzer and all other components to quickly summon all potentially interested people.

2. Bug migration
In both cases, we'd want to support a redirect script to take you from the old bug ids to the migrated bug page. In both cases, we would preserve the entire archive of existing bugs, but would not import the entire set into the "llvm-project" github repository.

I made some research here and there are some caveats here and there,
unfortunately. The current list of questions / issues could be found
at https://docs.google.com/document/d/1byEcbsxF3pL-HGGd_K6axdh87tbcsuJK3Dp6ThxGjKA
– please comment.

I’d like to add support for moving to github issues sooner rather than later. Not having to manually create bugzilla user accounts both gives me back some time in my day (not that important) and eliminates some of the barriers for new contributors to file or contribute to bug reports (really important).

My 2 cents to add (which I forgot at the round table) is that in general, we probably do a poor job at triaging or acknowledging new bugs that are being raised. There are some exceptions though, for some components, where a few people very actively triage and acknowledge new bug reports. I’d hate to see this disappear. Therefore, I think it’s important for people to be able to continue to easily filter updates to bugs based on components and/or keyword - so that the few that are currently motivated and perform a lot of bug triage keep on doing so - by enabling a high signal-to-noise ratio in github issue notifications for them.

I’m not sure if this is easy to do with github issues (I don’t think I saw ideal solutions being described in the thread above). Maybe getting all notification emails from all bugs, and then being able to filter it client-side based on keywords will work well enough, I don’t know.
I don’t have a strong feel on whether we should block migration on having good enough notification filtering, but I’d like to encourage enabling good enough filtering sooner rather than later.

I’m afraid I don’t really have enough experience with github issues to know what’s possible with respect to client-side filtering.

Thanks,

Kristof

+1. I’m very interested in being automatically subscribed to issues in a very limited set of tools. I’m currently auto-subscribed on the corresponding bugzilla components and actively look at any new issues on those tools, usually within 24 hours, even if I don’t respond to them all. I would be unlikely to remember to refresh a search anywhere near as frequently, and I’ve found getting my emails to filter correctly to be easier said than done. I don’t think it’s a blocker to migration necessarily, especially if the majority of people support migration (I personally am ambivalent, aside from this point), but I do think this workflow should be a priority in resolving.

James

Is that only new issues? Or all activity? If it's all activity on all issues, you're effectively auto-subscribing to all issues, and really nobody would want that. Well, maybe, like, 3 people.

Looking at docs for notification: https://help.github.com/en/github/receiving-notifications-about-activity-on-github/about-notifications#types-of-notifications

There are 4 levels:
- ignore all notifications including @mention
- notification on @mention or if participating (left a comment, owner of issue/PR)
- notification on releases + previous item
- get notification about *all* activity: every update in every issue/PR

However, there is such thing as teams: https://help.github.com/en/github/setting-up-and-managing-organizations-and-teams/about-teams

We could create teams for front-end, back-ends, static analyzer and all other components to quickly summon all potentially interested people.

With teams we can easily create a GitHub action to auto-subscribe all team
members when a new label is added to an issue, so I think using teams is
a good option.

-Tom

We held a round-table at the llvm dev conference about what other pieces of Github infrastructure we may want to use. This thread in particular is about switching to github issue tracking. Use of other parts of Github functionality was also discussed -- but that should be for other email threads.

Most of the ideas here were from other people. I /believe/ this proposal represents the overall feeling of the folks at the round-table, in spirit if not in exact details, but nobody else has reviewed this text, so I can't make any specific such claim as to who the "we" represents, other than myself. Just assume all the good ideas here were from others, and all the bad parts I misremembered or invented.

Background
----
Our bugzilla installation is...not great. It's been not-great for a long time now.

Last year, I argued against switching to github issues. I was somewhat optimistic that it was possible to improve our bugzilla in some incremental ways...but we haven't. Additionally, the upstream bugzilla project was supposed to make a new release of bugzilla ("harmony"), based on bugzilla.mozilla.org <http://bugzilla.mozilla.org>'s fork, which is much nicer. I thought we would be able to upgrade to that. But there has been no such release, and not much apparent progress towards such. I can't say with any confidence that there will ever be. I no longer believe it really makes sense to continue using bugzilla.

This year, we again discussed switching. This time, nobody really spoke up in opposition. So, this time, instead of debating /whether/ we should switch, we discussed /how/ we should switch. And came up with a plan to switch quickly.

GitHub issues may not be perfect, but I see other similarly-large projects using it quite successfully (e.g. rust-lang/rust) -- so I believe it should be good for us, as well. Importantly, Github Issues is significantly less user-hostile than our bugzilla is, for new contributors and downstream developers who just want to tell us about bugs!

Proposal
----
We propose to enable Github issues for the llvm-project repository in approximately two weeks from now, and instruct everyone to start filing new issues there, rather than in bugzilla.

I think we need a gap between when we turn on issues and when we tell people to
start using it, so that we can get some kind of notification system in place.
I think this may also address some concerns people have had with this proposal.

What about if we turn issues on in 1 week and then only start telling people
to use it for new bugs once we have a decent notification system working?

Some things we'd like to get in place before turning on Github's Issue tracker:
1. Updated documentation.
2. An initial set of issue tags we'd like to use for triaging/categorizing issues.

Here are my suggestions for the minimal set of tags:

+ 1 per LLVM backend
+ 1 per top-level directory in https://github.com/llvm/llvm-project

I think if we start here we can create more specialized tags as
GitHub issues gets more traffic and we have more experience using it.

-Tom