[RFC] GlobalISel support for X86

e-kud · September 13, 2023, 4:48pm

Introduction

GlobalISel is another approach for instruction selection comparing to
SelectionDAG.

The main features, as I can highlight, are less time required for
selection and selection at function level. In contrast, SelectionDAG operates inside basic
block scope and function level optimizations mostly done by CodeGenPrepare pass.

Starting migration from SelectionDAG to GlobalISel is especially painful for X86
due to highly specialized and optimized DAG combiners.

This RFC is about suggesting the first steps in this journey.

Current state

GlobalISel has some support for X86 back-end. Despite of the fact, we can see
issues closed as obsolete or won't fix and suggestions to not use GlobalISel
in production. Also there is a warning in clang if you try to use
-fglobal-isel for X86 target.

However, there are commits mostly from @RKSimon and @tschuett that
introduce new legalization rules or opcode selection for X86.

Current approach

I’ve taken a several commits that introduce new legalization of opcodes. But
after legalization they cannot be selected. It looks a little bit strange to me.
We’ve made a functional change in compiler, but nothing has changed in terms of
produced code. Why is it important?

During my experiments with GlobalISel coverage, I needed G_SELECT support:
hooray, legalization is already there, only selection is needed to add. But if
you try to reuse selection tables from SelectionDAG to select G_SELECT, you
fail. There is no patterns to select ISD::SELECT, so, we either need to write
selection in C++, or write a selection pattern for SelectionDAG so that ISD::SELECT
obtained from GlobalISel can match it. Also we can rewrite legalization into splitting
G_SELECT into G_ICMP with G_CMOV (target dependent opcode).

The idea here is that fixing legalization without a plan how to select
legalized opcodes may create problems.

SelectionDAG compatibility

The current state of X86InstructionSelector is to reuse SelectionDAG tables and
if failed, try to use C++ selection that should be written especially for
GlobalISel.

It is a straightforward idea, however SelectionDAG tables contain X86ISD
nodes that have no mapping at all from GlobalISel opcodes. I’ve met this
problem when tried to support G_FSHL and G_FSHR
D157505 when we have specialized
X86ISD::FSHL/X86ISD::FSHR nodes that are used only for i16 type.

Target intrinsic support

Unfortunately, I haven’t found a target that has rich support of intrinsics. On most targets
(AArch64, M68k, PowerPC, RISCV, X86) intrinsics are completely rewritten or omitted. It is generally not
an option for x86 (as well as for other targets, I think).

Specifically for X86, intrinsics are written as a mapping from IntrinsicID to ISD or X86ISD node.
To reuse X86IntrinsicsInfo.h, we need two things:

The existing mapping including not only SelectionDAG nodes but GlobalISel nodes.
Add GlobalISel nodes and GINodeEquiv declaration to map GlobalISel nodes during instruction
selection

If the first is mostly a refactoring issue, the second looks like the complete reimplementation.
However, if we admit that one way or another most of the X86ISD nodes will have an
equivalent in GlobalISel for purposes alternative to intrinsics then it is not a problem.

Alternatively, I’d like to ask authors of GlobalISel whether there is a view to provide generalized
intrinsic support. Now, GlobalISel users prefer to ignore target intrinsics or reimplement them from
the scratch.

Combiners

At this moment it is hard to predict something about combiners since only AArch64, Mips and AMDGPU
implement them. If someone has any thoughts about it, please share.

From my perspective, at initial stage we only may think about how frequently we introduce target
specific GlobalISel nodes during legalization. Because here we determine whether we want to introduce
target specific combine patterns after legalization or we try to keep generic nodes until instruction
selection to allow the generic combiner working.

AArch64 experiment

To roughly estimate the existing support, I’ve taken AArch64 backend (considering its GlobalISel support as one of the most advanced) and done the following:

Modified IRTranslator pass to dump LLVM IR of a function when GlobalISel
representation has a few G_* opcodes (ideally only one, but this is
not always possible)
Compiled the IRs for X86 using GlobalISel meanwhile updating the IRs from
AArch64 specifics.
Collected statistics is out of 212 (on the moment of the experiment) generic GlobalISel opcodes:
1. Generic opcodes not appeared in AArch64 with GlobalISel from .ll to .mir: 43
2. Triggered internal assert about PhysReg copy: 2
3. Cannot be legalized: 110
4. Cannot be selected: 11
5. Successfully compiled: 47

It’s clear that this experiment may be too pessimistic as we may hit vector
typed opcodes when only scalar version is supported, or some f128 types, or
multi-opcode IRs where only auxiliary one cannot be compiled and therefore it
gates the positive result. Nevertheless, this statistic shows that we can
compile barely 25% of isolated GlobalISel opcodes.

C/C++ coverage

After the experiment with isolated GlobalISel opcodes, I’ve decided to use C and
C++ Validation Test Suites. Unfortunately, no tests can be compiled:

G_GLOBAL_VALUE isn’t supported for fpic mode
D157396
Apparently, there is a place assuming that after selection all virtual
registers have a register class. However, it is not true for GlobalISel (it’s not X86
specific, AArch64 has the same behavior) D157458
All tests require G_SELECT. The problem with its selection is described above.

Everything else besides opcodes

I couldn’t identify any other issues that may require a systematic approach in the first place, e.g. calling conventions or TableGen issues.

Proposal

The general idea is to repeat AArch64 path: to start with basic support without
any combiners or optimizations i.e. -O0 GlobalISel.

Understand the current coverage:
1. C and C++ Validation Test Suites are used as a criterion for X86 GlobalISel
  support.
2. LLVM testsuite with enforced -global-isel for more complete testing
3. Reuse tests from AArch64 experiment for per-opcode testing. These tests
  will be becoming useless when more opcodes supported, and proper tests are added
  into LLVM Testsuite.
Follow opcode-oriented approach: when adding support for a single opcode,
support it from IRTranslator till InstructionSelector (at once or at least
have understanding how to do it).
Prioritization of opcodes can be done using testsuites from point 1. But
general idea is a two axes plane:
1. Start supporting from scalar versions and end with vectorized.
2. Start from basics (add, shl, mul) and end with independent and target
  intrinsics (llvm.returnaddress, llvm.sse.*)
During implementation I propose to reach the maximum reuse of existing
SelectionDAG tables. However, we may try to create
GlobalISel specific patterns as an alternative.

Open questions

To sum up, there are two questions I don’t have answer for.

How to minimize efforts for target intrinsics support?
Are there concerns that should be considered now for further steps with combiners.

@RKSimon @phoebe @tschuett @nikic @arsenm

arsenm · September 14, 2023, 7:21am

This is not true. Target intrinsics pass through G_INTRINISC* opcodes and the DAG imported patterns understand the same int_target_xxx patterns. The X86ISD opcodes are not equivalent to intrinsics, they are equivalent to target defined pseudo instructions. You can define a mapping from target defined pseudos to target defined SDNodes such that patterns still work (e.g. G_AMDGPU_FFBH_U32)

arsenm · September 14, 2023, 7:23am

This is just a bug. Everything after selection should have a register class

RKSimon · September 14, 2023, 9:40am

I’ve always been of the opinion that trying to catch up to SelectionDAG is an insane task to take on initially.

My work on X86 GlobalISel tends to wax and wane, but the plan at that back of my head has been to see if we can match FastISel first, a lot of the emitter code is very similar already and I’d dearly like to not end up with 3 Selection modes for very long…

We’d then delete X86FastISel.cpp (and use GlobalISel for -O0) and we’d have a strong footing to THEN start adding optimization to begin to match SelectionDAG.

As well as that, I’d like to see more base level test files check + compare SelectionDAG/FastISel/GlobalISel codegen - beginning with bitops/shifts for both scalar/vector would be a start, then add/sub/mul/div then control flow etc. I don’t like the way that so many duplicate codegen test files are appearing in the CodeGen\X86\GlobalISel subdir - that should just be for specific GISel legalization/selection pass tests.

tschuett · September 14, 2023, 7:04pm

I used the remarks to find gaps in AArch64. It might work similarly for X86:

cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="AArch64" -DLLVM_USE_LINKER=lld -DLLVM_ENABLE_PROJECTS="clang" -DCLANG_BOOTSTRAP_PASSTHROUGH="LLVM_TARGETS_TO_BUILD" -DCLANG_ENABLE_BOOTSTRAP=On -DBOOTSTRAP_CMAKE_CXX_FLAGS="-march=native -Rpass-missed='gisel*' -fglobal-isel -mllvm -global-isel-abort=2" ./llvm
ninja stage2

e.g.

remark: unable to legalize instruction: %98:_(<16 x s64>) = G_ZEXT %97:_(<16 x s8>) [-Rpass-missed=gisel-legalize]

Note that the size of the output type is too large for AArch64.

AArch64 has a post legalizer combiner that creates target specific pseudo instructions, e.g.:

github.com/llvm/llvm-project

[AArch64][GlobalISel] Add lowering for constant BIT/BIF/BSP

llvm:main ← davemgreen:gh-gi-bsp

opened 01:26PM - 10 Sep 23 UTC

davemgreen

+153 -130

I have been making my way through GlobalISel vector fp, making sure all the basi…c types are currently working. In looking at fcopysign I managed to rewrite it in a way that we just widen to vectors, as opposed to relying on custom lowering. That misses handling vector constants (which Mark here is currently looking at) and transforming `or`/`and` into `bit`. The non-constant bit/bif/bsp already work through tablegen patters, this patch handles the constant case, mirroring the basic support for `or(and(X, C), and(Y, ~C))` from ISel tryCombineToBSL. BSP gets expanded to either BIT, BIF or BSL depending on the best register allocation. G_BIT can be replaced with G_BSP as a more general alternative.

e-kud · September 18, 2023, 2:09am

Here I mostly refer to AArch64.
I see that there are two separate implementations for Intrinsic::aarch64_crypto_sha1h intrinsic.
GlobalISel specific is in lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp. SelectionDAG is through DAG pattern def SHA1Hrr : SHAInstSS< 0b0000, "sha1h", int_aarch64_crypto_sha1h> (more likely that I miss something and one complements another).

But yes. Thank you, I was wrong. Now I see that DAG patterns of intrinsics are finely selected. So, now it’s a problem of intrinsics without patterns…

e-kud · September 19, 2023, 12:40am

I’ve always been of the opinion that trying to catch up to SelectionDAG is an insane task to take on initially.

Yes, sure, there are no plans to bring up everything at once. The initial plan is to gradually increase the amount of code we can correctly select with GlobalISel without any optimization.

if we can match FastISel first

I have concerns that matching FastISel is not enough. When FastISel is able to fallback to SelectionDAG for a single block, GlobalISel may fallback during final selection (after legalization and reg bank selection) and we have to redo all the work.

I don’t like the way that so many duplicate codegen test files are appearing in the CodeGen\X86\GlobalISel subdir - that should just be for specific GISel legalization/selection pass tests.

So, do you mean that instead of testing/checking all stages of selection we should rather add a run with GlobalISel to existing tests (let’s say end-to-end selection testing)?

tschuett · September 19, 2023, 5:30am

End-to-end tests are great to show the differences in the generated code. For changes to the legalizer or select there should be mir tests, because you can test the pass in isolation.

nickdesaulniers · September 19, 2023, 8:26pm

I kind of wonder what @topperc 's opinion is here.

To me, it seems like a TON of work.

bigboze · September 21, 2023, 10:22pm

We found by setting the register banks for intrinsics, our existing patterns just worked. We parse the Intrinsic::IITDescriptor and select the appropriate bank based on the type as a general solution which managed to get almost all of our intrinsics to work end to end.

aengelke · October 2, 2023, 3:43pm

We’d then delete X86FastISel.cpp (and use GlobalISel for -O0)

Compile times would likely increase quite a bit with that. At least on AArch64, GlobalISel at -O0 is much slower (>2x) than FastISel. The IRTranslator alone takes nearly as much time as FastISel in its entirety on my (fairly limited) benchmarks. From what I gather GlobalISel is unlikely to match FastISel performance any time soon due to its multi-pass approach.

RKSimon · October 3, 2023, 9:11am

I which case it sounds like this is the area that needs work first. I can’t be the only one with no interest in ending up supporting 3 different selection mechanisms: xkcd: Standards

asb · October 18, 2023, 6:18pm

That’s interesting - a lot of time has obviously passed since then, but the 2017 GISel slides showed much more promising performance vs FastISel. Do you have any idea if things have slowed down since then, or is this a case of different benchmarks showing different results?

qcolombet · April 25, 2024, 9:39am

Kind of late to the party, didn’t see the initial post!

I think this is a little bit of both but I wouldn’t be surprised if the biggest contributor is the fact that we didn’t track compile time closely. In other words, I think there’s room for improvement.

FWIW, wrt FastISel, the bigger plan was to be able to generate a fast path for GISel from the TableGen patterns (selection, combines, and legalization). That plan has yet to materialize since efforts to move more stuff to TableGen for GISel have been slow.

I’m happy to brainstorm on how to move forward but realistically I won’t have time to contribute code that much (at least in the near future).

Topic		Replies	Views
Running GlobaISel passes after SelectionDAG instruction selection LLVM Dev List Archives	6	117	November 10, 2016
The state-of-art at Instruction selection LLVM Dev List Archives	3	197	February 23, 2021
[GlobalISel] Questions about selection regions LLVM Dev List Archives	2	96	February 1, 2017
Women in Compilers and Tools May 2021 Meetup: Bringing up GlobalISel for AArch64 Women in Compilers and Tools	0	581	May 10, 2021
Enabling GlobalISel for Apple AArch64 platforms LLVM Project	11	2126	November 8, 2022