[RFC] Design for APX feature EGPR and NDD support

Intel Advanced Performance Extensions:

The main features of Intel® APX include:

  1. 16 additional general-purpose registers (GPRs) R16–R31, also referred to as Extended GPRs (EGPRs) in this document;
  2. Three-operand instruction formats with a new data destination (NDD) register for many integer instructions;
  3. Conditional ISA improvements: New conditional load, store and compare instructions, combined with an option for the compiler to suppress the status flags writes of common instructions;
  4. Optimized register state save/restore operations;
  5. A new 64-bit absolute direct jump instruction.

We will focus on the sub-features EGPR and NDD in this thread.

EGPR

Not all X86 instructions are extended for EGPR. The following is an overview of which instructions are extended and how we are going to implement them.

• Legacy space:

All instructions in legacy maps 0 and 1 that have explicit GPR or memory operands can use the REX2 prefix to access the EGPR, except XSAVE*/XRSTOR.

• EVEX space:

All instructions in the EVEX space can access the EGPR in their register/memory operands.

For the above instructions, we don’t add new entries in TD, and instead we extend GPR with R16-R31 and make them allocatable only when the feature EGPR is available, just like what we did when introducing R8-R15.

Besides, some instructions in legacy space with map 2/3 and VEX space are promoted into EVEX space. Opcode and opcode map may change after the promotion. For these instructions, we add new entries in TD to avoid overcomplicating the assembler and disassembler.

For those instructions that cannot access EGPR, we introduce new register classes GR8/16/32/64_NOREX2. We do not update the register class for each entry of instructions b/c it would affect some optimization passes like machine instruction schedule, whose analysis relies on the static type of operands in TD. Instead, we leverage the target hook TargetInstrInfo:getRegClass to distinguish the instructions by the rules mentioned above.

The constraints of asm operands keep the same meaning as before, e.g. R16-R31 are not allocated when ‘q’,‘r’,‘l’ constraint is used.

All EGPRs are caller-saved registers, and we will add some new kinds of relocations and relocation optimization for them. See discussion at
https://groups.google.com/g/x86-64-abi/c/KbzaNHRB6QU
https://groups.google.com/g/x86-64-abi/c/Gy0RmoP2LnE
https://groups.google.com/g/x86-64-abi/c/saQyqBeL5XE

The support for EGPR in LLDB is almost on hold b/c we haven’t investigated it. Only the mapping to dwarf registers is added in TD file. We would appreciate if someone is knowledge in this field and volunteer to implement it.

NDD

APX extends some instructions with a new form that has an extra register operand called a new
data destination (NDD). In such forms, NDD is the new destination register receiving the result of the
computation and all other operands (including the original destination operand) become read-only source operands.

Compared to legacy instructions, NDD is more friendly to register allocation. We support them similarly as what we did for EVEX promotion for YMM16-YMM31, namely preferring to select NDD version than the legacy one during instruction selection, and compress it to legacy instruction after register allocation if possible. We reuse the EvexToVexInstPass pass to do the compression and rename it to CompressEvexInstPass b/c legacy instruction is not in VEX space.

2 Likes

cc more folks @RKSimon @topperc @e-kud @nwg @efriedma-quic @nickdesaulniers

Will you need to bias register allocation to suggest that it makes a source and dest that same register so that you don’t use NDD unnecessarily?

@topperc I don’t know whether the bias is feasible or better. Current proposal is that we always select NDD version if it may bring benefits. In other words, non-NDD is only selected if the source and destination are bound together, e.g. (RBP/RSP operation). And then in the pass CompressEVEX, we compress the NDD to non-NDD when the source and dest are same.

The register allocator can select a different register for the destination even if both operands are killed by the instruction. Here’s the RISC-V patch where I copied SystemZ code to give hints to the register allocator about making source and dest the same. ⚙ D138242 [RISCV] Use register allocation hints to improve use of compressed instructions.

1 Like

Good point! I will try implementing it and see its impact on instruction count and code size.

First PR: [X86] Support EGPR (R16-R31) for APX by KanRobert · Pull Request #67702 · llvm/llvm-project (github.com)