Enabling GlobalISel for Apple AArch64 platforms

We at Apple have been working on bringing up GlobalISel with the AArch64 target for a number of years now. Last year we presented our progress on this for all opt levels and stated our intentions to enable it for Apple platforms in the near future. Please see the talk video here if you’re interested in some of the work we did to make it competitive with SelectionDAG.

For the past few months we’ve been working on qualifying GlobalISel on internal Apple software, fixing issues as we discover them. We believe that we’ve reached the crossover point where the benefits of having it enabled by default (some code quality improvements, modular pipeline design, compile time etc) outweigh the downsides of widespread enablement.

As a summary of where we are in performance and code size metrics, for -O3 on Apple platforms we generally see GlobalISel within about 1% of SelectionDAG geomean. There are a few known code quality issues which we’ve yet to address which should reduce this gap. For code size, on CTMark -Os we’re about on par, with some benchmarks being smaller and some larger. Here we also have known issues that can be fixed to improve size further. For compile time, on average we see improvements of about 5%, but this is highly variable.

While the ultimate goal is to have a completely independent framework from SelectionDAG, there are some cases where the existing support in GlobalISel is incomplete, and therefore we require a fall back path to SelectionDAG. For most code on Apple platforms, this fallback rate is 1% or less. We will of course also be supporting SelectionDAG too and will continue to offer the -fno-global-isel option. For clang users targeting AArch64 on other platforms, such as Linux or Windows, there should be no change.

Our proposed time frame for this to happen is for some time in mid September.

Please let us know of any concerns or questions with this plan.

Thanks,
Amara

9 Likes

Thanks for your hard work. I am really looking forward to the switch.

  • Can I run clang with GlobalISel and when it falls back to SelectionDAG it stops and gives sufficient information for an GitHub issue?
  • Are there any plans for the target specific intrinsics ala Neon and the AVX family?

Can I run clang with GlobalISel and when it falls back to SelectionDAG it stops and gives sufficient information for an GitHub issue?

Yes, you can run -fglobal-isel -mllvm -global-isel-abort=2 and that will cause clang to emit remarks when it encounters a fallback. If you want to trigger a compiler abort (and therefore a reproducer), you can use -mllvm -global-isel-abort=1 instead.

Are there any plans for the target specific intrinsics ala Neon and the AVX family?

IIRC for NEON we already support many of the intrinsics (though probably not all) via the SelectionDAG importer. For SVE however that’s going to be a large amount of work since we don’t yet have scalable types support in GISel.

Thx. IIRC the LLT already supports scalable vectors. At the least the AArch64 Registerbank could be extended.

1 Like

Thanks for the update.

Something that came up at the EuroLLVM GlobalISel round table was that LLVM is still very far from having any ‘GlobalISel-only’ reference target.

I realise SelectionDAG is going to be very tricky to remove entirely, but how close is aarch64 to at least removing fast isel?

And how are you intending to allocate your main efforts in the future - to removing SelectionDAG entirely or chasing the last few perf regressions?

What’s the fallback rate? It seems to me like the AArch64 legalization rules currently don’t bother trying to handle legalizing every illegal case, in particular for vector types.

I realise SelectionDAG is going to be very tricky to remove entirely, but how close is aarch64 to at least removing fast isel?

There are some features that we don’t currently support and at -O0 we go to FastISel:

  • AArch64_32 - no current plans to implement this in the near future.
  • ILP32
  • Large model code with MachO

Other platforms like Windows or Android that may have their own features that we don’t support yet, so those will need to be implemented (and tested well) before we can remove the FastISel code completely.

And how are you intending to allocate your main efforts in the future - to removing SelectionDAG entirely or chasing the last few perf regressions?

By “removing SelectionDAG” I assume you mean enabling without fallback? I see this as taking another few engineer-years, due to the long tail of features. Having the compiler abort because of an unsupported construct is not something we want to expose to users.

For the transitionary time period after enablement, we’ll be working on bugs and code size/performance regressions as the highest priority items. After that will be implementing missing features.

What’s the fallback rate? It seems to me like the AArch64 legalization rules currently don’t bother trying to handle legalizing every illegal case, in particular for vector types.

In most workloads, the fallback rate is around 1% or less. Most of them are shufflevectors. 1% is the level we’re comfortable with enabling, but the end goal is of course to eliminate them all.

This doesn’t sound like there is much in the way of performance wins with globalisel over selectiondag currently. I had thought that e.g. the larger scope – the “global” in globalisel – would allow it to generate better code. Is this something we could expect in the future?

Just to double check, is that 5% faster compiles overall, or just the instruction selection part?

This doesn’t sound like there is much in the way of performance wins with globalisel over selectiondag currently. I had thought that e.g. the larger scope – the “global” in globalisel – would allow it to generate better code. Is this something we could expect in the future?

Yes, we do generate better code in some cases, but on average missing optimizations outweigh those benefits. Our plan is to implement more and more of those missing optimizations, so over time the benefits of function level optimization become more apparent.

Just to double check, is that 5% faster compiles overall, or just the instruction selection part?

It is very workload dependent, but it’s the former. Just comparing the equivalent parts of the codegen pipeline with SelectionDAG we’re > 2.5x faster on average.

A somewhat belated patch is up now to enable it now via a clang front-end change: ⚙ D137269 [Clang][AArch64][Darwin] Enable GlobalISel by default for Darwin ARM64 platforms.

2 Likes

You mentioned shuffle vectors. Is the solution:

  • more effort in the legalizer
  • a vector combine/canonalizer pass before Isel
  • more effort in Isel too handle weird cases