Introduction
GlobalISel
is another approach for instruction selection comparing to
SelectionDAG
.
The main features, as I can highlight, are less time required for
selection and selection at function level. In contrast, SelectionDAG
operates inside basic
block scope and function level optimizations mostly done by CodeGenPrepare
pass.
Starting migration from SelectionDAG
to GlobalISel
is especially painful for X86
due to highly specialized and optimized DAG combiners.
This RFC is about suggesting the first steps in this journey.
Current state
GlobalISel
has some support for X86 back-end. Despite of the fact, we can see
issues closed as obsolete
or won't fix
and suggestions to not use GlobalISel
in production. Also there is a warning in clang if you try to use
-fglobal-isel
for X86
target.
However, there are commits mostly from @RKSimon and @tschuett that
introduce new legalization rules or opcode selection for X86
.
Current approach
Iāve taken a several commits that introduce new legalization of opcodes. But
after legalization they cannot be selected. It looks a little bit strange to me.
Weāve made a functional change in compiler, but nothing has changed in terms of
produced code. Why is it important?
During my experiments with GlobalISel
coverage, I needed G_SELECT
support:
hooray, legalization is already there, only selection is needed to add. But if
you try to reuse selection tables from SelectionDAG
to select G_SELECT
, you
fail. There is no patterns to select ISD::SELECT
, so, we either need to write
selection in C++, or write a selection pattern for SelectionDAG
so that ISD::SELECT
obtained from GlobalISel can match it. Also we can rewrite legalization into splitting
G_SELECT
into G_ICMP
with G_CMOV
(target dependent opcode).
The idea here is that fixing legalization without a plan how to select
legalized opcodes may create problems.
SelectionDAG compatibility
The current state of X86InstructionSelector
is to reuse SelectionDAG
tables and
if failed, try to use C++ selection that should be written especially for
GlobalISel
.
It is a straightforward idea, however SelectionDAG
tables contain X86ISD
nodes that have no mapping at all from GlobalISel
opcodes. Iāve met this
problem when tried to support G_FSHL
and G_FSHR
D157505 when we have specialized
X86ISD::FSHL
/X86ISD::FSHR
nodes that are used only for i16 type.
Target intrinsic support
Unfortunately, I havenāt found a target that has rich support of intrinsics. On most targets
(AArch64
, M68k
, PowerPC
, RISCV
, X86
) intrinsics are completely rewritten or omitted. It is generally not
an option for x86
(as well as for other targets, I think).
Specifically for X86
, intrinsics are written as a mapping from IntrinsicID
to ISD
or X86ISD
node.
To reuse X86IntrinsicsInfo.h
, we need two things:
- The existing mapping including not only
SelectionDAG
nodes butGlobalISel
nodes. - Add
GlobalISel
nodes andGINodeEquiv
declaration to mapGlobalISel
nodes during instruction
selection
If the first is mostly a refactoring issue, the second looks like the complete reimplementation.
However, if we admit that one way or another most of the X86ISD
nodes will have an
equivalent in GlobalISel
for purposes alternative to intrinsics then it is not a problem.
Alternatively, Iād like to ask authors of GlobalISel
whether there is a view to provide generalized
intrinsic support. Now, GlobalISel
users prefer to ignore target intrinsics or reimplement them from
the scratch.
Combiners
At this moment it is hard to predict something about combiners since only AArch64
, Mips
and AMDGPU
implement them. If someone has any thoughts about it, please share.
From my perspective, at initial stage we only may think about how frequently we introduce target
specific GlobalISel
nodes during legalization. Because here we determine whether we want to introduce
target specific combine patterns after legalization or we try to keep generic nodes until instruction
selection to allow the generic combiner working.
AArch64 experiment
To roughly estimate the existing support, Iāve taken AArch64
backend (considering its GlobalISel
support as one of the most advanced) and done the following:
- Modified
IRTranslator
pass to dump LLVM IR of a function whenGlobalISel
representation has a fewG_*
opcodes (ideally only one, but this is
not always possible) - Compiled the IRs for
X86
usingGlobalISel
meanwhile updating the IRs from
AArch64
specifics. - Collected statistics is out of 212 (on the moment of the experiment) generic
GlobalISel
opcodes:- Generic opcodes not appeared in AArch64 with GlobalISel from
.ll
to.mir
: 43 - Triggered internal assert about PhysReg copy: 2
- Cannot be legalized: 110
- Cannot be selected: 11
- Successfully compiled: 47
- Generic opcodes not appeared in AArch64 with GlobalISel from
Itās clear that this experiment may be too pessimistic as we may hit vector
typed opcodes when only scalar version is supported, or some f128
types, or
multi-opcode IRs where only auxiliary one cannot be compiled and therefore it
gates the positive result. Nevertheless, this statistic shows that we can
compile barely 25% of isolated GlobalISel
opcodes.
C/C++ coverage
After the experiment with isolated GlobalISel
opcodes, Iāve decided to use C and
C++ Validation Test Suites. Unfortunately, no tests can be compiled:
G_GLOBAL_VALUE
isnāt supported for fpic mode
D157396- Apparently, there is a place assuming that after selection all virtual
registers have a register class. However, it is not true forGlobalISel
(itās notX86
specific,AArch64
has the same behavior) D157458 - All tests require
G_SELECT
. The problem with its selection is described above.
Everything else besides opcodes
I couldnāt identify any other issues that may require a systematic approach in the first place, e.g. calling conventions or TableGen issues.
Proposal
The general idea is to repeat AArch64
path: to start with basic support without
any combiners or optimizations i.e. -O0 GlobalISel
.
- Understand the current coverage:
- C and C++ Validation Test Suites are used as a criterion for X86 GlobalISel
support. - LLVM testsuite with enforced
-global-isel
for more complete testing - Reuse tests from AArch64 experiment for per-opcode testing. These tests
will be becoming useless when more opcodes supported, and proper tests are added
into LLVM Testsuite.
- C and C++ Validation Test Suites are used as a criterion for X86 GlobalISel
- Follow opcode-oriented approach: when adding support for a single opcode,
support it from IRTranslator till InstructionSelector (at once or at least
have understanding how to do it). - Prioritization of opcodes can be done using testsuites from point 1. But
general idea is a two axes plane:- Start supporting from scalar versions and end with vectorized.
- Start from basics (
add
,shl
,mul
) and end with independent and target
intrinsics (llvm.returnaddress
,llvm.sse.*
)
- During implementation I propose to reach the maximum reuse of existing
SelectionDAG
tables. However, we may try to create
GlobalISel
specific patterns as an alternative.
Open questions
To sum up, there are two questions I donāt have answer for.
- How to minimize efforts for target intrinsics support?
- Are there concerns that should be considered now for further steps with combiners.