In November 2022, we published an RFC for a new Machine Description Language (MDL) for LLVM. The purpose of the MDL language is to support a much broader class of accelerator architectures in the CodeGen and MC libraries. We’ve done an initial pull request that includes just the baseline documentation for the project. The work in it’s entirety can be found at GitHub - MPACT-ORG/llvm-project at all, together with extensive documentation (in llvm/docs/Mdl).
Since it’s a large contribution, before we continue with PRs I’d like to encourage a bit more conversation with the community about the work, and try to actively address peoples’ questions and concerns.
A few notes about the status of the work:
- This work grew out of a need to model much more complex architectures in LLVM, going back at least 15 years. More recently, we wanted to support Google’s TPU ML accelerators, and found it really challenging to do so in the existing infrastructure. The MDL directly addresses the issues we’ve had, and made it “easy” to support that class of architecture, as well as all the existing upstream architectures.
- Support for MDL is integrated into the MC and CodeGen libraries along side support for Schedules and Itineraries, in the same general style that both Schedules and Itineraries are supported. It’s not meant to replace either, although it carries more detailed information about the microarchitecture than either of those, and could enable more sophisticated scheduling algorithms.
- LLVM MDL support is selectable on an opt-in basis via an explicit CMake configuration flag. When enabled, it can be enabled/disabled by a command line option.
- MDL directly supports all upstream targets that have Itineraries and/or Schedules. We have a tool that scrapes information from TableGen and produces an “equivalent” MDL description, which we compile and include in the MC libraries (much like the TableGen-generated files). We don’t expect this to be a typical use case, but was done to prove out the integration into all the CodeGen and MC components.
- The footprint of MDL in LLVM is really quite modest: around 1300 lines of code added to CodeGen and MC, and around 600 lines of code added to support all the pertinent targets in the Target libraries. The MDL support code (separate from existing code) is around 2500 LOC. The majority of the code is in the external tooling for the language.
- It’s very well tested: when enabled, we pass all but 190 of the 93007 tests. Of those, most “failures” are either very minor incidental (and valid) scheduling differences, or tests that specifically test the format of debug information.
- For the runtime tests we’ve done to date (on various X86 platforms), there are no discernible performance deltas (typically +/-0.2%, varying run-to-run), in fact we almost always generate exactly the same code. This is what we expected: the intent of this effort was not to improve performance for existing targets, but to be able to better support more targets. That said, it is able to do some things (like bundle-packing) slightly better than the existing infrastructure.
Please take a look and lets discuss any questions/concerns you may have.
-Reid