The GlobalISel match table, which powers both the InstructionSelector and Combiners, currently uses 64 bits per entry.
This is mostly a waste of space, as no entries actually need that. In fact, we never need more than 2 bytes unless we’re dealing with match table indexes or immediates.
Proposal
I made a WIP that changes entries in the match table to be 1 byte each:
Data is encoded over multiple bytes when needed:
MatchTable Opcodes are uint8
MachineInstr IDs and Operand IDs are ULEB128.
The vast majority of patterns only need a handful of those, so they only use 1 byte. We have some outliers like PPC that have really, really, really long patterns where we need 255+ IDs.
LLTs indexes are int8
Positive values are references to the LLT enum generated by the backend, negative is for temporary LLTs derived from a register operand.
Predicate enums, target opcodes/RCs/RBs/registers are 2 bytes.
No enum in any target needs >2 bytes I believe
Table indexes are 4 bytes
Immediates are 8 bytes (could be ULEB if we really wanted to optimize but they’re not used often so I don’t think it’s worth it)
Pros
This is mostly a code size optimization.
A quick test shows LLC (MinSizeRel) going from 116Mb to 108Mb so we gain about 7% binary size.
Cons
There is a lot of test churn (42 files), and tests are becoming harder to read to the use of macros to encode multi-byte values. GIMatchTableExecutorImpl.h also becomes a tiny bit more complicated due to the need to call functions to extract values.
All of this is pretty minor. The need to manually update that many tests is annoying, but I can get it done, but first I’d like to investigate performance a bit more.
Performance
Before I submit a review, I would like to first do performance testing. Can anyone help me measure the impact this has on build times of LLVM itself, and on the build times of programs using GlobalISel?
I’m fearing some longer LLVM build times because the match table got more complex due to the use of macros to encode values.
I’m also wondering how this affects the speed of the match table’s execution. I’m hoping it doesn’t get slower if we start having entries spread across cache lines?
I like the idea! It’s now like the MatcherTable of SelectionDAG ISel.
Can you fire up a PR so that we can review and merge it? Also, I tried to optimize the size of SelectionDAG MatcherTable, some of the optimizations may be useful to GISel too I think.
Currently, only a few of targets have GISel support (AArch64 and RISCV?). So I think we can notice developers of these targets. And, we have compile time tracker (@nikic)
Actually, I am optimistic to the execution time. You have reduced a lot of MatcherTable bytes, we should get better cache utilization.
Good idea, I will try to build the .inc file and see that way. Thanks
For the AMDGPU GlobalISel testing I can do it too (even on AArch64 as well) but it’s just that I’m not sure it’s particularly reliable. I’d prefer to see numbers on a real project, I think AArch64 has some to check compile times but not sure where to find them
Worse build times (expected), equivalent or slightly better runtime.
I’ll look a pre-encoding constants and only relying on macros when the value isn’t known ahead-of-time (e.g. enum). Maybe that’ll help build times a bit?
Though it’s expected builds get slower, the array is smaller in terms of bytes, but much bigger in terms of number of entries.