On behalf of AMD, I’m pleased to announce the open sourcing of an LLVM backend for AMD/Xilinx AI Engine processors. (GitHub - Xilinx/llvm-aie: Fork of LLVM to support AMD AIEngine processors) These processors exist in a number of devices including RyzenAI SoCs.
The repository currently focuses on supporting the AIE2 architecture implemented by the XDNA accelerators in “Phoenix” and “Hawk Point” devices.
A simple flow for running code on these devices is documented here: E2E Linux Example · Xilinx/llvm-aie Wiki · GitHub
Note that these accelerators include an array of processors, while the LLVM backend only supports a single processor. Support for devices as a whole is available in open source tools based on MLIR (https://github.com/Xilinx/mlir-aie).
For more architecture information, see: AMD Technical Information Portal
In addition to support for LLVM code generation, the repository also includes support for Clang, LLD, binutils (e.g. ‘llvm-objdump’), Compiler-RT, and LLVM-LIBC.
Generally speaking, AI Engine processors are in-order, exposed-pipeline VLIW processors. Each VLIW instruction bundle specifies the behavior of one or more functional units,
which begin executing a new instruction at the same time. The processor pipeline does not include stall logic to enforce data dependencies, and instructions will continue
executing in order regardless of other instructions in the pipeline. As a result, the compiler is able to schedule machine instructions which access the same register in ways that potentially overlap.
Other key architectural characteristics include varying width instruction slots between different instruction encodings and relatively small address spaces (20-bit pointer registers).
The presence of varying-width instruction slots implies some code alignment restrictions for instructions which are branch or return targets.
In order to support the unusual architecture features of AI Engine, this repository adds LLVM support for several specific features:
- support for non-power of 2 pointers;
- improved TableGen support for specifying operand latencies and resource conflicts of exposed pipeline instructions;
- scheduler support for negative operand latencies (i.e. an instruction writing to a register may be scheduled after a corresponding use);
- scheduler support for slot assignment for instructions that can be issued in multiple VLIW slots;
- support for selecting relocations for instructions with multiple encodings;
- support for architectures with code alignment restrictions;
- improved register allocation support for complex register hierarchies, specifically related to spills of sub-registers of large compound-registers;
We’d like to invite the community to comment on these approaches and would like to begin the process of upstreaming these generic improvements.
Currently, we are actively working on improving QOR and supporting the newest versions of the AIE architecture, in particular the XDNA2 accelerator in “Strix Point” devices.
The Peano Team