LLVM MC "Dialects" / MLIR all-the-way-down compilation

It’s long bugged me that the LLVM infrastructure has the public-facing LLVM IR and then an entire shadow infrastructure built around the MC representation and datatypes.

It seems like MLIR’s dialect-based approach would be an elegant way to “lift” this infrastructure so that machine code for various processors (i.e. a hardware dialect) would look and act like any other intermediate representation. I’m thinking of an AArch64 dialect, and X86_64 dialect, and so on. Instruction set extensions can also be handled in a conceptually very neat way, because SSE, SSE2, SSSE, AVX, NEON, and the mind-boggling array of AVX_512* extensions can just be treated as dialects. Having a real type-checked lowering from a C IR to some of these dialects would drastically increase the productivity of folks (like me!) doing compiler and code generation research.

There will always be some things that have to be done by an ultimate “back-end”, like generating the actual binary data into an object file, but it seems that a lot of things that are done on the LLVM MC datastructures could be manipulating an MLIR dialect instead.

Is anyone working on something along these lines, or does anyone know of any work others are doing? What are some major challenges that might not be immediately obvious?

One of the principal challenges I can see is that lowering from dialect to dialect is typically done with semantics-preserving but not necessarily optimal selections of lower-level dialect ops or op sequences to represent higher-level dialect ops. However, machine-code instruction selection is typically treated as an optimization problem, even if it is actually done using heuristics in practice.

There seems to be (from what I’ve seen in the MLIR codebase) relatively little to suggest that people treat dialect conversion like optimal instruction selection. However, this can be really successful even in higher-level contexts – I’ve worked on using optimal instruction selection to generate code for CNNs and we found that huge differences can result from lowering choices. If anyone is interested reading more about that work the paper is over here https://dl.acm.org/doi/10.1145/3168805

This was one of the initial design goals of MLIR, but most of the work in the core repository and around is currently focused on the higher-than-LLVM levels of abstraction. It would be very interesting to see the other end.

So far, the LLVM IR dialect is treated as sink in the lowering pipeline. We don’t do any meaningful transformations at that level and defer to LLVM proper for that purpose. There exists a prototype of LLVM IR importer, and some interest in making it more robust so that we can consume optimized IR produced by LLVM (ping @sanjoy_das_google). This may be helpful for your use case to import the IR and keep lowering it down to some machine-specific format.

MLIR pattern-rewriting infrastructure that underpins most lowerings has some small heuristic mechanisms, but wasn’t designed as a global optimization problem. It can be extended or partially reused for that purpose.

@nicolasvasilache discussed [Abandoned][RFC] AVX512-specific Dialect for implementing and benchmarking XNNPack in MLIR, which later led to [RFC] Starting an AVX512 Target-Specific Dialect - Rebooted that has been upstreamed. This dialect targets LLVM IR intrinsics, so it operates on a higher level than MC, but I suppose can be repurposed, describing operations from both sides of LLVM IR.

I’ve been thinking about this in the context of the RegionKindInterface: MC-oriented dialects may need a slightly different region semantics in order to represent in-place register updates and other nuances that are present in code generation but not representable in standard SSA.

I’d recommend looking at the MachineInstr level of abstraction: it supports both SSA and non-ssa register representations (incl mixed together for ‘pinned’ registers in SSA). The best MLIR analogy is that fixed registers are represented either equivalently to “std.constant” (nee std.register?) or as an attribute on the instruction.

2-address instructions like x+=y are represented as “x1 = x2 + y” in SSA form, and use a “required to be equal” constraint that the register allocator honors when assigning fixed registers.