Hello LLVM discourse!
I’m the main codegen developer of LLVM-MOS, an out-of-tree LLVM backend for the 6502:
We’re still a ways away from our first official release, but it’s now starting to get about that time, so I wanted to touch base with the broader LLVM community about what we’ve built, how strange it is, and start a converstation about whether it (eventually) belongs in the monorepo.
The compiler is fairly C99-complete, lacking only floating point. I’ve gone through the llvm-test-suite and gotten every SingleSource/ test I could manage to fit in 64k passing. Quite a few of these required 32-bit integers; I converted those tests to accept 16-bit ints where feasible, and disabled the rest.
From a code generation point of view, the project’s driving motivation is to apply LLVMs tremendous analysis and optimization capabilities to the very difficult challenge of generating good code for the 6502.
We’re not finished optimizing it by a long shot (even to the point suitable for a first release), but the results we’ve obtained so far are quite promising.
For those unfamiliar with the 6502, it’s a challenging architecture:
- An 8-bit accumulator and two 8-bit index registers, each special-purposed for different instructions.
- An 8-bit hardware stack, suitable for nothing more than return addresses and a few bytes of temporaries.
- One practical indirect addressing mode that could be used to access a simulated soft stack, which is less than half the speed of an absolute memory access.
- 8-bit indexed addressing modes, while the pointer size is 16-bits. The 8-bit index is silently zero-extended before adding.
- Fast zero page (first 256 bytes of RAM) memory addressing modes.
We’ve dealt with these problems by using LLVM in a fairly unorthodox ways. Our two strategies are:
- Make the 6502 more “normal”:
- A zero page region is used as a bank of “imaginary” general purpose registers. With this, more-or-less able to steal the RISC-V calling convention.
- Rewrite the 6502 instructions set into a “logical” instruction set made of pseudos. For example, the three separate immediate load opcodes, LDA #12, LDX #12, and LDY #12, can all be collapsed into a logical LDImm opcode: %0:GPR = LDImm 12. This allows regular register allocation techniques to mostly work on the 6502.
- Simulate a soft stack using a pointer. Standard practice for 6502 C compilers.
- Make LLVM more “weird”:
- A whole program analysis walks the call graph and identifies functions that cannot have more than one invocation simultaneously active. These functions have their stack frames allocated as global arrays instead of on the soft stack. The downside is that asynchronous entries into LTO units must be annotated as such. Otherwise, e.g. leaf functions would always use this optimization, even if called both from main and from an interrupt handler.
- I’ve had to do unspeakable things to Loop Strength Reduction to make the 6502’s addressing modes representable. LSR’s Formula data model assumes that all components of the sum that make up an addressing mode have the same width, but on the 6502, they generally have different widths, with the smaller components being implicitly zero-extended.
- Extremely unusual legalization patterns depending heavily on the quality of lowering G_SELECT to branches. Most multi-byte legalizations on the 6502 involve branching, since branching is extremely cheap on the 6502, since there’s no cache and no pipeline.
- Who knows, there may be more!
So, given all this, I’d like to open up discussion on whether or not this belongs in LLVM proper. While it’s true that this effort has resulted in making parts of LLVM more general, it’s unclear to me whether or not that’s a direction in which LLVM would actually like to go.
Would having a cool 6502 backend be a feather in LLVM’s cap? A pointless burden? Somewhere in between?
It’s also very likely that I’ve implemented something in a ridiculous way. Please don’t be shy about telling me so; I’m a relative newcomer to compiler development!
It’s been a wild ride getting from there to here, and there’s still a long ways to go. Kudos to all those who helped make LLVM what it is; our little backend stands on the shoulders of giants!