Enabling GlobalISel for Apple AArch64 platforms

amara · July 20, 2022, 4:23am

We at Apple have been working on bringing up GlobalISel with the AArch64 target for a number of years now. Last year we presented our progress on this for all opt levels and stated our intentions to enable it for Apple platforms in the near future. Please see the talk video here if you’re interested in some of the work we did to make it competitive with SelectionDAG.

For the past few months we’ve been working on qualifying GlobalISel on internal Apple software, fixing issues as we discover them. We believe that we’ve reached the crossover point where the benefits of having it enabled by default (some code quality improvements, modular pipeline design, compile time etc) outweigh the downsides of widespread enablement.

As a summary of where we are in performance and code size metrics, for -O3 on Apple platforms we generally see GlobalISel within about 1% of SelectionDAG geomean. There are a few known code quality issues which we’ve yet to address which should reduce this gap. For code size, on CTMark -Os we’re about on par, with some benchmarks being smaller and some larger. Here we also have known issues that can be fixed to improve size further. For compile time, on average we see improvements of about 5%, but this is highly variable.

While the ultimate goal is to have a completely independent framework from SelectionDAG, there are some cases where the existing support in GlobalISel is incomplete, and therefore we require a fall back path to SelectionDAG. For most code on Apple platforms, this fallback rate is 1% or less. We will of course also be supporting SelectionDAG too and will continue to offer the -fno-global-isel option. For clang users targeting AArch64 on other platforms, such as Linux or Windows, there should be no change.

Our proposed time frame for this to happen is for some time in mid September.

Please let us know of any concerns or questions with this plan.

Thanks,
Amara

tschuett · July 20, 2022, 2:01pm

Thanks for your hard work. I am really looking forward to the switch.

Can I run clang with GlobalISel and when it falls back to SelectionDAG it stops and gives sufficient information for an GitHub issue?
Are there any plans for the target specific intrinsics ala Neon and the AVX family?

amara · July 20, 2022, 4:58pm

Can I run clang with GlobalISel and when it falls back to SelectionDAG it stops and gives sufficient information for an GitHub issue?

Yes, you can run -fglobal-isel -mllvm -global-isel-abort=2 and that will cause clang to emit remarks when it encounters a fallback. If you want to trigger a compiler abort (and therefore a reproducer), you can use -mllvm -global-isel-abort=1 instead.

Are there any plans for the target specific intrinsics ala Neon and the AVX family?

IIRC for NEON we already support many of the intrinsics (though probably not all) via the SelectionDAG importer. For SVE however that’s going to be a large amount of work since we don’t yet have scalable types support in GISel.

tschuett · July 20, 2022, 6:47pm

Thx. IIRC the LLT already supports scalable vectors. At the least the AArch64 Registerbank could be extended.

RKSimon · July 23, 2022, 2:12pm

Thanks for the update.

Something that came up at the EuroLLVM GlobalISel round table was that LLVM is still very far from having any ‘GlobalISel-only’ reference target.

I realise SelectionDAG is going to be very tricky to remove entirely, but how close is aarch64 to at least removing fast isel?

And how are you intending to allocate your main efforts in the future - to removing SelectionDAG entirely or chasing the last few perf regressions?

arsenm · July 23, 2022, 6:57pm

What’s the fallback rate? It seems to me like the AArch64 legalization rules currently don’t bother trying to handle legalizing every illegal case, in particular for vector types.

amara · July 24, 2022, 6:41pm

I realise SelectionDAG is going to be very tricky to remove entirely, but how close is aarch64 to at least removing fast isel?

There are some features that we don’t currently support and at -O0 we go to FastISel:

AArch64_32 - no current plans to implement this in the near future.
ILP32
Large model code with MachO

Other platforms like Windows or Android that may have their own features that we don’t support yet, so those will need to be implemented (and tested well) before we can remove the FastISel code completely.

And how are you intending to allocate your main efforts in the future - to removing SelectionDAG entirely or chasing the last few perf regressions?

By “removing SelectionDAG” I assume you mean enabling without fallback? I see this as taking another few engineer-years, due to the long tail of features. Having the compiler abort because of an unsupported construct is not something we want to expose to users.

For the transitionary time period after enablement, we’ll be working on bugs and code size/performance regressions as the highest priority items. After that will be implementing missing features.

amara · July 24, 2022, 6:41pm

What’s the fallback rate? It seems to me like the AArch64 legalization rules currently don’t bother trying to handle legalizing every illegal case, in particular for vector types.

In most workloads, the fallback rate is around 1% or less. Most of them are shufflevectors. 1% is the level we’re comfortable with enabling, but the end goal is of course to eliminate them all.

hansw2000 · September 7, 2022, 7:06pm

This doesn’t sound like there is much in the way of performance wins with globalisel over selectiondag currently. I had thought that e.g. the larger scope – the “global” in globalisel – would allow it to generate better code. Is this something we could expect in the future?

Just to double check, is that 5% faster compiles overall, or just the instruction selection part?

amara · September 8, 2022, 6:50am

This doesn’t sound like there is much in the way of performance wins with globalisel over selectiondag currently. I had thought that e.g. the larger scope – the “global” in globalisel – would allow it to generate better code. Is this something we could expect in the future?

Yes, we do generate better code in some cases, but on average missing optimizations outweigh those benefits. Our plan is to implement more and more of those missing optimizations, so over time the benefits of function level optimization become more apparent.

Just to double check, is that 5% faster compiles overall, or just the instruction selection part?

It is very workload dependent, but it’s the former. Just comparing the equivalent parts of the codegen pipeline with SelectionDAG we’re > 2.5x faster on average.

amara · November 2, 2022, 5:08pm

A somewhat belated patch is up now to enable it now via a clang front-end change: ⚙ D137269 [Clang][AArch64][Darwin] Enable GlobalISel by default for Darwin ARM64 platforms.

tschuett · November 8, 2022, 4:08pm

You mentioned shuffle vectors. Is the solution:

more effort in the legalizer
a vector combine/canonalizer pass before Isel
more effort in Isel too handle weird cases
…

Topic		Replies	Views
[GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try! LLVM Dev List Archives	101	192	January 2, 2018
Women in Compilers and Tools May 2021 Meetup: Bringing up GlobalISel for AArch64 Women in Compilers and Tools	0	571	May 10, 2021
Arm Backend's GlobalISel Status ARM globalisel	4	522	May 17, 2022
Use Global ISel or SelectionDAG LLVM Dev List Archives	4	86	February 17, 2020
Using [GlobalISel] to provide peephole optimizations LLVM Dev List Archives	2	148	August 30, 2019

Enabling GlobalISel for Apple AArch64 platforms

Related Topics