Status of Bugzilla Migration

Dear All,

Some of you who are checking the migration notes
(LLVM Buzilla Migration Status - Google Docs) might already have noticed that we're stuck
again. Let me provide more information about what is going on now and
what the plans are.

As a reminder, previously we imported all issues in the archive repo
and essentially the very last step remained: migration to the live
llvm-project repo. This step is crucial and one-way, once started we
cannot undo the steps we'd made. We also have to rely on GitHub here
as we cannot do it via rate-limited API calls

During the final checks two issues were revealed:
  - Notifications are still sent in some cases
  - Migration sets the last modification date of the closed issues (it
looks like it was implemented like "re-open issue, transfer and close
again"). As a result, all closed issues essentially got sorted
chronologically before the real open ones.

These issues were fixed at GitHub side and we proceeded with
re-checking everything. It turned out that another issue appeared: the
labels were silently lost and the migrated issues were completely
labelless, despite being annotated by 140+ labels we had originally.
For now this is a show-stopper issue. The issue was reported and
acknowledged by GitHub, however, not ETA was provided.

Our current options are:
  1. Abandon the migration
  2. Wait until the issue is resolved on GitHub side
  3. Try to find alternative solutions to workaround GitHub issue

2. is essentially not an option. I am proposing to abandon the
migration and unlock the bugzilla if the solution will not be found by
the end of this week.

The only alternative I'm seeing is to apply the labels post-migration.
There are important downsides:
  - This has to be done via GitHub API and we're rate limited to ~5000
requests per hour, so this means that the labelling will take ~20
hours. I was told that there is no way for us to have the API rate
limit increased.
  - This might trigger notifications. My quick check via web ui does
not, but I cannot be 100% with anything here
  - (the most important) This will screw the "last modified" timestamp
as label setting is an event that is recorded in the issue. There is
no way to set some "old" timestamp, it is assigned by GitHub
automatically.

For now I'm testing the script for 3. and waiting for any news from GitHub.

I will keep you updated.

What bad stuff happens if you just open up https://github.com/llvm/llvm-bugzilla-archive/issues (even if you then make another historical archive later) to use as the bug tracker until you and github have ironed out all the migration from one project to another project issues? rather than going all the way back to bugzillia which is then going to impose some other multi day migration at a later point.

In my mind I’ve already divorced from bugzilla, I’m ready to move on with my life with github!

MyDeveloperDay

Well, this is another alternative, yes, but it's up to the community to decide.

Dear All,

Some of you who are checking the migration notes
(LLVM Buzilla Migration Status - Google Docs) might already have noticed that we're stuck
again. Let me provide more information about what is going on now and
what the plans are.

As a reminder, previously we imported all issues in the archive repo
and essentially the very last step remained: migration to the live
llvm-project repo. This step is crucial and one-way, once started we
cannot undo the steps we'd made. We also have to rely on GitHub here
as we cannot do it via rate-limited API calls

During the final checks two issues were revealed:
  - Notifications are still sent in some cases
  - Migration sets the last modification date of the closed issues (it
looks like it was implemented like "re-open issue, transfer and close
again"). As a result, all closed issues essentially got sorted
chronologically before the real open ones.

These issues were fixed at GitHub side and we proceeded with
re-checking everything. It turned out that another issue appeared: the
labels were silently lost and the migrated issues were completely
labelless, despite being annotated by 140+ labels we had originally.
For now this is a show-stopper issue. The issue was reported and
acknowledged by GitHub, however, not ETA was provided.

Our current options are:
  1. Abandon the migration
  2. Wait until the issue is resolved on GitHub side
  3. Try to find alternative solutions to workaround GitHub issue

2. is essentially not an option. I am proposing to abandon the
migration and unlock the bugzilla if the solution will not be found by
the end of this week.

Thank you for all of the hard work you've put into this so far, and
thank you for the detailed update on the unfortunate place we're at.

When you say "abandon the migration", do you mean temporarily or
permanently? I'd be strongly in favor of temporarily abandoning the
migration so that we can continue to do useful work against bugs while
we sort this out. If you're thinking of abandoning permanently, I
could be in support of that as well, but I'd want to know what our
aspirational goals are for the bug database long-term before giving my
support.

~Aaron

This thought had occurred to me as well. Using a separate repo for bug tracking seems reasonable as an intermediate step. Unless there’s a complexity here I’m missing, I’d probably vote for that in favor of going all the way back to bugzilla.

Philip

p.s. Anton, thank you for the update and all the work that has gone into this.

If there are new issues created directly in llvm-bugzilla-archive, and they have cross-references to other (new or old) issues, we’d want to make sure they get fixed up along with the originally-from-bugzilla references. (Recall that all issues will be renumbered when they move to llvm-project.)

It would be mildly annoying to have the bug repo move twice instead of once, but if the reference re-writing works correctly then I don’t have any real objection.

–paulr

Thanks for all the work and info, Anton. Based on your writeup, I think option 3 is best.

Losing the last update timestamps on all the issues is unfortunate, but I think it’s OK. We already know the migration doesn’t have perfect fidelity, and that’s OK.

I also think we can wait a day to get labels on the migrated issues. I think my bigger concern with the rate-limited APIs is that it’s hard to test scripts that take 20 hours to run, so there is some risk that the label migration script fails or mislabels issues. Still, I would just hope for the best here. It’s not critical to get labels on old issues on day 1. Maybe one way to deal with this is to apply labels to recently modified issues first.

Notifications are concerning, but your test via the web UI gives me enough confidence to want to push forward.

Finally, you are sort of the one in the hot seat here doing the work, so I favor any solution that takes the pressure off you. :slight_smile: That means either going back to bugzilla temporarily, or moving forward with the migration and fixing the labels as best we can over time.

When you say "abandon the migration", do you mean temporarily or
permanently? I'd be strongly in favor of temporarily abandoning the
migration so that we can continue to do useful work against bugs while
we sort this out. If you're thinking of abandoning permanently, I
could be in support of that as well, but I'd want to know what our
aspirational goals are for the bug database long-term before giving my
support.

Well, here is the key point: all "temporary" solutions (e.g. temporary
return to bugzilla or use the current archive) rely on the assumption
that the issues we're facing will be fixed one day. However, here we
are depending on GitHub that might have their own priorities / plans
and we do not have any ways to influence their decisions besides
sharing some concerns and asking questions. So, I'd personally not go
this way until we will know for how long this interim solution will be
in use. Otherwise it could be in such a state forever, e.g. if GitHub
decides that they will keep the status quo.

As for "permanent abandoning" – I think in such a situation we'd need
to take one step back and seriously reconsider all the infrastructure
we're having. Maybe even checking what are the alternatives
platform-wise.

Paul,

Yes, during the migration all references should be rewritten. At least
this is how it is documented, I'm not 100% sure now this is indeed so
:wink:

Reid,

I also think we can wait a day to get labels on the migrated issues. I think my bigger concern with the rate-limited APIs is that it's hard to test scripts that take 20 hours to run, so there is some risk that the label migration script fails or mislabels issues. Still, I would just hope for the best here. It's not critical to get labels on old issues on day 1. Maybe one way to deal with this is to apply labels to recently modified issues first.

I think we need to apply labels in chronological order. E.g. first
apply the labels to the issues that were last modified far away from
now. In such cases we at least will have the sorting in the proper
order. I definitely have the creation time of each issue, but not sure
about the last modification timestamp (there is a timestamp when the
issue is closed, so at least for some issues we do have such timestamp
at hand).

Another thing that I need to check is how everything works after the
migration. I do have labels for each issue in the archive. However,
after the migration it won't be there anymore. So, an additional
question is whether API requests will be redirected or I will need to
build the mapping first. Given the rate limit of 5k requests per hour,
the complete sweep over all issues will take 11 hours.

  • This has to be done via GitHub API and we’re rate limited to ~5000
    requests per hour, so this means that the labelling will take ~20
    hours. I was told that there is no way for us to have the API rate
    limit increased.

This 5000 request per hour limit, is that per repo or per access token? Could we potentially make a pool access token from multiple github accounts to sidestep the issue? Say 20 tokens to do the migration in 1 hour?

–Jeff Miller

FWIW, “20 hours” or “11 hours” or “three days” is like nothing, compared to what the migration has already been doing. If it only requires taking Bugzilla down for 24 hours to do it, IMO you should just do it already — whatever “it” is.

Also, re timestamps: The choices seem to be

  • Wait for GitHub to offer us some way of importing timestamps, then do the migration; or
  • Do the migration, then wait for GitHub to offer us some way of retroactively changing some of the timestamps.
    Neither is perfect, but the latter is clearly better for LLVM’s purposes.

–Arthur

Hello Arthur,

FWIW, "20 hours" or "11 hours" or "three days" is like nothing, compared to what the migration has already been doing. If it only requires taking Bugzilla down for 24 hours to do it, IMO you should just do it already — whatever "it" is.

Well, it's for single sweep. So, if we'd need to do this, say 5 times,
then everything starts to be very interesting.

Also, re timestamps: The choices seem to be
- Wait for GitHub to offer us some way of importing timestamps, then do the migration; or
- Do the migration, then wait for GitHub to offer us some way of retroactively changing some of the timestamps.
Neither is perfect, but the latter is clearly better for LLVM's purposes.

Not the timestamps, the labels. And note that there is nothing in
general that could be done in GitHub retroactively. At least for us as
I've been told. If this would be possible we'd simply import into an
empty repo, add git repo, add releases (dating them into the past) and
we're done...

From my experience adding a label to an issue does not trigger any

notifications (though it can trigger web hooks), so I think that shouldn't
cause problems. Also agree that being able to retroactively edit edited
time on GitHub is almost certainly not going to happen, whereas GitHub
fixing their repo migration to preserve labels seems likely. One question,
Anton, did you create the labels in the target repo before trying the
migration? Just a vague hypothesis that perhaps it might preserve them if
the labels already exist, but drop them if they don't (pure speculation,
but plausible enough to be worth testing out IMO).

The labels do exist. I got confirmation that they drop all labels.

Dear All,

Some of you who are checking the migration notes
(https://bit.ly/3HVjr7a) might already have noticed that we’re stuck
again. Let me provide more information about what is going on now and
what the plans are.

As a reminder, previously we imported all issues in the archive repo
and essentially the very last step remained: migration to the live
llvm-project repo. This step is crucial and one-way, once started we
cannot undo the steps we’d made. We also have to rely on GitHub here
as we cannot do it via rate-limited API calls

During the final checks two issues were revealed:

  • Notifications are still sent in some cases
  • Migration sets the last modification date of the closed issues (it
    looks like it was implemented like “re-open issue, transfer and close
    again”). As a result, all closed issues essentially got sorted
    chronologically before the real open ones.

These issues were fixed at GitHub side and we proceeded with
re-checking everything. It turned out that another issue appeared: the
labels were silently lost and the migrated issues were completely
labelless, despite being annotated by 140+ labels we had originally.
For now this is a show-stopper issue. The issue was reported and
acknowledged by GitHub, however, not ETA was provided.

Our current options are:

  1. Abandon the migration

  2. Wait until the issue is resolved on GitHub side

  3. Try to find alternative solutions to workaround GitHub issue

  4. is essentially not an option. I am proposing to abandon the
    migration and unlock the bugzilla if the solution will not be found by
    the end of this week.

Seems reasonable to me!

The only alternative I’m seeing is to apply the labels post-migration.
There are important downsides:

  • This has to be done via GitHub API and we’re rate limited to ~5000
    requests per hour, so this means that the labelling will take ~20
    hours. I was told that there is no way for us to have the API rate
    limit increased.
  • This might trigger notifications. My quick check via web ui does
    not, but I cannot be 100% with anything here
  • (the most important) This will screw the “last modified” timestamp
    as label setting is an event that is recorded in the issue. There is
    no way to set some “old” timestamp, it is assigned by GitHub
    automatically.

For now I’m testing the script for 3. and waiting for any news from GitHub.

Thanks for the work :slight_smile:
I hope you can get your script working!
Maybe if you can share this on a public repo, others here can help to do small test runs in private forks and cross-validate or help fix issues with it?

Mehdi,

Maybe if you can share this on a public repo, others here can help to do small test runs in private forks and cross-validate or help fix issues with it?

I certainly could do this, but I doubt this will be useful as the
input will be a local bugzilla dump..

Stephan,

Also, does GitHub's GraphQL API v4 offer higher throughput than their Rest API v3 for such labeling? See Rate limits and node limits for the GraphQL API - GitHub Docs .

Indeed, GraphQL API has different limits and some API is available
only via GraphQL endpoints (e.g. majority of migration API is only
there).

(I work on Visual C++ so I don't know anything special about GitHub - although I haven't used GraphQL to perform modifications, I learned enough JS/GraphQL to perform read-only queries for a status chart. According to my understanding, applying labels through GraphQL mutation should consume far fewer "points" than individual REST calls consume the v3 limits.)

This certainly requires testing, right. Thanks for the suggestion!

IMHO it would be a really good idea to do this!
If the “bugzilla dump” is in some reasonably sane format such as JSON, then people could even hand-craft sample input scenarios to try out the import script on.
There are basically two devops operations here:

  • Export a Bugzilla instance into (e.g. JSON)
  • Load (e.g. JSON) into a GitHub instance
    The ultimate migration will do the first step and then the second, (A) on the official LLVM Bugzilla and the official LLVM GitHub, (B) during a single atomic period where both are protected against tampering by random users.
    But before then, it would certainly be easy to test the second step on people’s own personal GitHub instances. And I would have expected Thanksgiving weekend’s aborted migration to have completed the first step and produced an (e.g. JSON) data file as a side effect, so people would even have some sample data to try out. (Of course they’d want to use only a subset of it, because the whole (e.g. JSON) data file is probably on the order of (50,000 bugs x let’s say 100KB per bug) ~= 5GB of data.)

IIUC, none of the data being exported from Bugzilla is “private” in any sense, so there’s no particular concern with publishing the (e.g. JSON) data.

It occurs to me that it would also be a really really good idea to have a script that can compare a Bugzilla against a GitHub and verify that they contain the same data, so that we can know whether the migration succeeded. That script can also be published and tested ahead of time.

–Arthur